The main objective of QUETAL is the development of core components and knowledge sources for performing multi-lingual and content-oriented NL-based question answering (QA). Answering sources are large pools of NL text documents from different sources (online via the Internet and offline via document servers), which are annotated with diverse kinds of linguistic, semantic and domain-specific information that is automatically extracted from text sources by applying information extraction and text mining technologies.
QA technologies will be explored that operate on semi-structured NL documents. Meta-information (such as HTML or XML tags) for answer candidate selection is taken into account. We are investigating QA technology that can handle questions and answers going behind the currently dominating simple fact-based, short-answer oriented systems. Especially, we will analyse and answer questions that refer to
- IE-related templates (i.e., full or partial instances of record-like attribute-value matrices, e.g., company or product information, bio-informatics related information, scientific publications or course material), and
- definitions or explanations as found in textual or semi-structured encyclopedias.
The main scientific tenets underlying QUETAL are:
- Multi-lingual open-domain QA processing. We are developing multi-lingual methods that support processing of a NL question in one language (say, German) and the extraction of answers from documents expressed in another language (say, English), or even extraction from mixed multi-lingual document sources.
- Self-Adaptive QA strategies. It is very important that a QA system can acquire control information from past QA events in order to improve its performance for new QA events (e.g., a computed query-answer pair). Situation-oriented information is either obtained from information by, or about, the user, or from previously computed question-answer pairs.
- Semi-structured QA technologies. To gain the maximum available information from candidate NL documents, we are exploring QA strategies that are based on the combination of non-structured document parts (i.e., NL expressions) and structured document parts including domain-specific meta- information (e.g., ontologies) represented using annotation language, e.g., HTML, XML, RDF or DAML/OIL.