günter neumann

Textual Inference

Excitement Open Platform

The EXCITEMENT Open Platform (EOP) is an open source software platform containing state-of-the-art algorithms for recognizing texual entailment relations:

given two text fragments, one named text and the other named hypothesis, the task consists in recognizing whether the hypothesis can be inferred from the text.

Written in Java, EOP is a main product of the project EXCITEMENT - EXploring Customer Interactions through Textual EntailMENT which is funded by the European Commission under the European Union's Seventh Framework Programme (FP7). EOP is designed to be efficient and extendable. Highlights include:

  • Separation between Linguistic Analysis Pipelines and Entailment Components
  • Supporting modularity and interoperability among Components
  • Java API with source code
  • Pre-trained multi-lingual models (i.e. English, German, Italian)
  • Trainable with new sample data
  • Detailed documentation to understand the structure and implementation of EOP
  • Quick Start documentation to start using EOP from now
  • Results and configuration files archive shared among users

TIE - Textual Inference Engine

As part of the EOP platform, we have developed TIE, which is a lightweight tool for recognizing textual entailment based on the Maximum Entropy Modeling framework.. For details of TIE check this link, and this.

Question answering

The QALL-ME framework

The QALL-ME framework is a free Service Oriented Architecture (SOA) skeleton for mutlilingual QA systems. The public project deliverable The QALL-ME Architecture Design Issues and QA Framework (Neumann et al. 2007) describes the principles of the multilingual open-domain Question Answering framework as well as some future directions. More information about technical details, online demonstration, and download links can be find here.

Web-based Question answering system

An experimental web-based question answering system, that answers factoid and definition questions in several languages from Textsnippets returned by standard search engines, as described in (Figueroa and Neumann, 2006) and (Figueroa et al., 2009).

Cross-lingual open domain question answering

Quantico is a cross-lingual open-domain question answering system that can receive German questions and extracts exact answers from German or English documents either fetched from a local document collection or from the Web, cf. (Neumann and Sacaleanu, 2006) and (Sacaleanu and Neumann, 2007).

Information extraction

Information Extraction from Scientific Publications

In the project Dilia we have developed unsupervised methods for the extraction of technical terms, Named Entities and relations from the full text of scientific articles published in the journal Zeitschrift für Naturforschung.

German text processing

SMES is a is an information extraction core system for real world German text processing. It provides a set of basic powerful, robust, and efficient natural language components and generic linguistic knowledge sources which can easily be customized for processing different tasks in a flexible manner, cf. (Neumann et al., 1997), (Neumann et al., 2000).

German Named entity recognition and chunk parsing

Parts of SMES have also been realized as a standalone system called STP that recognizes named entities, online noun compounds and syntactic chunks by applying a cascade of finite state machines very efficiently, cf. (Neumann and Piskorski, 2002). This version is implemented in C++ and runs on Windows, Linux and MacOs. Please, contact me if you want more information about this version of SMES.

Multilingual Dependency Parsing

MDParser is a very fast data-driven multilingual dependency parser developed by my student Alexander Volokh. MDParser is an especially fast system and therefore it is particularly suitable for processing very large amounts of data. Currently, we are using it in our research systems for recognizing textual entailment (RTE); for more details see (Volokh and Neumann, 2011) and (Volokh et al., 2010).

Morphology

Morphix

Morphix is a very fast and robust morphological component for German. Besides inflectional analysis, it analyses compounds and is also able to generate wordforms from a given stem entry and some further (optional) morpho-syntactic information.

Data

Recognizing Textual Entailment

We have manually translated the English RTE-3 data set to German. The complete data set (800 pairs for development and 800 pairs for testing) can be downloaded form here. Note: This zip file is from 2nd December 2013 and contains an updated pair (id 215) in the development set! If you are using the data, please cite this link.

Customer Interaction Data of German Emails and Online Requests

We provide a public dataset of set of German emails and online requests from customers to the support center of a multimedia software company. A description of the dataset as well as a download link can be found here.