The project PARADICE will construct a core engine for natural language processing which allows an easy configuration of the system relative to specific requirements. Its main emphasis is accordingly on building a system with a flexible architecture and interface to dialog systems and other applications, which provides a large coverage German grammar, lexicon and morphology and which allows efficient and robust processing.
The main focus in the area of linguistic knowledge bases will be on a continuous broadening of grammatical coverage, as well as the integration of large scale lexical resources. The grammar will be an extension of the HPSG-style grammar for German with integrated syntax and semantics. The lexicon and morphology components will reuse the lexical data base SADAW with approximately 120 thousand lemmata and ca. 1.5 Mio word forms.
The research in efficient and robust processing will explore different strategies in parallel and in combination. The strategies pursued for increase of efficiency include above all Explanation Based Learning, compilation of HPSG structures into Tree Adjoining Grammars and the exploitation of faster general algorithms for processing and preferences in the control structure. Robustness will be obtained through specific error tolerant methods capable of dealing with unknown words and partially un-grammatical utterances.
Integration of these performance-oriented methods with the competence-based methods of PARADICE will be realized on the basis of a flexible system architecture, in which module integration follows an object-oriented system design. To simplify module integration, protocol definitions are specified in a rule-based declarative way.