Textual Inference
Excitement Open Platform
The EXCITEMENT Open Platform (EOP) is an open source software platform containing state-of-the-art algorithms for recognizing texual entailment relations:
given two text fragments, one named text and the other named hypothesis, the task consists in recognizing whether the hypothesis can be inferred from the text.
Written in Java, EOP is a main product of the project EXCITEMENT - EXploring Customer Interactions through Textual EntailMENT which is funded by the European Commission under the European Union's Seventh Framework Programme (FP7). EOP is designed to be efficient and extendable. Highlights include:
- Separation between Linguistic Analysis Pipelines and Entailment Components
- Supporting modularity and interoperability among Components
- Java API with source code
- Pre-trained multi-lingual models (i.e. English, German, Italian)
- Trainable with new sample data
- Detailed documentation to understand the structure and implementation of EOP
- Quick Start documentation to start using EOP from now
- Results and configuration files archive shared among users
TIE - Textual Inference Engine
As part of the EOP platform, we have developed TIE, which is a lightweight tool for recognizing textual entailment based on the Maximum Entropy Modeling framework.. For details of TIE check this link, and this.
Question answering
The QALL-ME framework
The QALL-ME framework is a free Service Oriented Architecture (SOA) skeleton for mutlilingual QA systems. The public project deliverable The QALL-ME Architecture Design Issues and QA Framework (Neumann et al. 2007) describes the principles of the multilingual open-domain Question Answering framework as well as some future directions. More information about technical details, online demonstration, and download links can be find here.
Web-based Question answering system
An experimental web-based question answering system, that answers factoid and definition questions in several languages from Textsnippets returned by standard search engines, as described in (Figueroa and Neumann, 2006) and (Figueroa et al., 2009).
Cross-lingual open domain question answering
Quantico is a cross-lingual open-domain question answering system that can receive German questions and extracts exact answers from German or English documents either fetched from a local document collection or from the Web, cf. (Neumann and Sacaleanu, 2006) and (Sacaleanu and Neumann, 2007).
Information extraction
Information Extraction from Scientific Publications
In the project Dilia we have developed unsupervised methods for the extraction of technical terms, Named Entities and relations from the full text of scientific articles published in the journal Zeitschrift für Naturforschung.
German text processing
SMES is a is an information extraction core system for real world German text processing. It provides a set of basic powerful, robust, and efficient natural language components and generic linguistic knowledge sources which can easily be customized for processing different tasks in a flexible manner, cf. (Neumann et al., 1997), (Neumann et al., 2000).
German Named entity recognition and chunk parsing
Parts of SMES have also been realized as a standalone system called STP that recognizes named entities, online noun compounds and syntactic chunks by applying a cascade of finite state machines very efficiently, cf. (Neumann and Piskorski, 2002). This version is implemented in C++ and runs on Windows, Linux and MacOs. Please, contact me if you want more information about this version of SMES.
Multilingual Dependency Parsing
MDParser is a very fast data-driven multilingual dependency parser developed by my student Alexander Volokh. MDParser is an especially fast system and therefore it is particularly suitable for processing very large amounts of data. Currently, we are using it in our research systems for recognizing textual entailment (RTE); for more details see (Volokh and Neumann, 2011) and (Volokh et al., 2010).
Morphology
Morphix
Morphix is a very fast and robust morphological component for German. Besides inflectional analysis, it analyses compounds and is also able to generate wordforms from a given stem entry and some further (optional) morpho-syntactic information.
Data
Recognizing Textual Entailment
We have manually translated the English RTE-3 data set to German. The complete data set (800 pairs for development and 800 pairs for testing) can be downloaded form here. Note: This zip file is from 2nd December 2013 and contains an updated pair (id 215) in the development set! If you are using the data, please cite this link.
Customer Interaction Data of German Emails and Online Requests
We provide a public dataset of set of German emails and online requests from customers to the support center of a multimedia software company. A description of the dataset as well as a download link can be found here.