günter neumann

Textual Inference

Excitement Open Platform

The EXCITEMENT Open Platform (EOP) is an open source software platform containing state-of-the-art algorithms for recognizing texual entailment relations:

given two text fragments, one named text and the other named hypothesis, the task consists in recognizing whether the hypothesis can be inferred from the text.

Written in Java, EOP is a main product of the project EXCITEMENT - EXploring Customer Interactions through Textual EntailMENT which is funded by the European Commission under the European Union's Seventh Framework Programme (FP7). EOP is designed to be efficient and extendable. Highlights include:

  • Separation between Linguistic Analysis Pipelines and Entailment Components
  • Supporting modularity and interoperability among Components
  • Java API with source code
  • Pre-trained multi-lingual models (i.e. English, German, Italian)
  • Trainable with new sample data
  • Detailed documentation to understand the structure and implementation of EOP
  • Quick Start documentation to start using EOP from now
  • Results and configuration files archive shared among users

TIE - Textual Inference Engine

As part of the EOP platform, we have developed TIE, which is a lightweight tool for recognizing textual entailment based on the Maximum Entropy Modeling framework.. For details of TIE check this link, and this.

Question answering

The QALL-ME framework

The QALL-ME framework is a free Service Oriented Architecture (SOA) skeleton for mutlilingual QA systems. The public project deliverable The QALL-ME Architecture Design Issues and QA Framework (Neumann et al. 2007) describes the principles of the multilingual open-domain Question Answering framework as well as some future directions. More information about technical details, online demonstration, and download links can be find here.

Web-based Question answering system

An experimental web-based question answering system, that answers factoid and definition questions in several languages from Textsnippets returned by standard search engines, as described in (Figueroa and Neumann, 2006) and (Figueroa et al., 2009). Test the new research version from here.

Cross-lingual open domain question answering

Quantico is a cross-lingual open-domain question answering system that can receive German questions and extracts exact answers from German or English documents either fetched from a local document collection or from the Web, cf. (Neumann and Sacaleanu, 2006) and (Sacaleanu and Neumann, 2007).

Information extraction

Information Extraction from Scientific Publications

In the project Dilia we have developed unsupervised methods for the extraction of technical terms, Named Entities and relations from the full text of scientific articles published in the journal Zeitschrift für Naturforschung. Examples of automatically annotated articles together with additional meta information and navigation tools can be found here. The usage of the Web-GUI should be very simple and intuitive: The upper items in a Web page allow you to display all recognized terms or entities in a list form. You can inspect additional information, e.g., extracted relations or a term cloud of important extracted terms. By means of the lower items of a Web page you can switch on/off the extracted and annotated entities. A double click on each annotated entity or multi-word term additionally supports selection of search tools. In the upper right corner of a Web page you can select an annotated article for inspection. Just try out !

German text processing

SMES is a is an information extraction core system for real world German text processing. It provides a set of basic powerful, robust, and efficient natural language components and generic linguistic knowledge sources which can easily be customized for processing different tasks in a flexible manner, cf. (Neumann et al., 1997), (Neumann et al., 2000). Get more information from here.

German Named entity recognition and chunk parsing

Parts of SMES have also been realized as a standalone system called STP that recognizes named entities, online noun compounds and syntactic chunks by applying a cascade of finite state machines very efficiently, cf. (Neumann and Piskorski, 2002). This version is implemented in C++ and runs on Windows, Linux and MacOs. Please, contact me if you want more information about this version of SMES.

Platform for Named Entity Recognition

NER-Hub is a platform for Named Entity processing. It uses a voting strategy to combine the results produced by several existing NER systems (OpenNLP, LingPipe and Stanford), aiming at reducing the amount of errors produced by them individually. The system's architecture is based on the framework of OSGi - a Java service platform and module system, which offers fexibility in terms of component management. The project can be run as and accessed via a web service and comes with a graphical web user interface. We are currently working on making this NER-Hub platform open source.

Other Named Entity Recognition Tools

Links to other Named Entity Recognizers, which have been developed by my students are here.

Multilingual Dependency Parsing

MDParser is a very fast data-driven multilingual dependency parser developed by my student Alexander Volokh. MDParser is an especially fast system and therefore it is particularly suitable for processing very large amounts of data. Currently, we are using it in our research systems for recognizing textual entailment (RTE); for more details see (Volokh and Neumann, 2011) and (Volokh et al., 2010). We are planning to make the MDParser an open source, so in the meanwhile check this site.

Morphology

Morphix

Morphix is a very fast and robust morphological component for German. Besides inflectional analysis, it analyses compounds and is also able to generate wordforms from a given stem entry and some further (optional) morpho-syntactic information. Download Morphix from here.

Data

Recognizing Textual Entailment

We have manually translated the English RTE-3 data set to German. The complete data set (800 pairs for development and 800 pairs for testing) can be downloaded form here. Note: This zip file is from 2nd December 2013 and contains an updated pair (id 215) in the development set! If you are using the data, please cite this link.

Customer Interaction Data of German Emails and Online Requests

We provide a public dataset of set of German emails and online requests from customers to the support center of a multimedia software company. A description of the dataset as well as a download link can be found here.