The aim of this project is to improve existing methodology for generic deep linguistic analysis, i.e. the syntactic and semantic analysis needed for many language technology applications. A dependency grammar model will be developed that extends the representations of successful data-driven dependency parsing schemes by additional elements of linguistic and cognitive sophistication such as a typed feature system, explicit soft constraints, the use of both semantic and syntactic dependencies and means for incrementally produced partial results.
The knowledge incorporated in the existing German HPSG grammar of the Lab and in the Stanford ERG will be imported into the planned fully lexicalized dependency grammar. This will be possible because of the consequent and consistent use of the multiple-inheritance type hierarchy as the sole basis for all encoded linguistic knowledge. By redefining all lexical types, the existing lexicons will be automatically converted to the new format. The existing German and English HPSG grammars will also serve as a baseline for comparison.
Coverage of the grammars will be extended by learning from dependency banks, either native or converted from suitable treebanks. The lexicon will be extended by data-driven lexical-type prediction.
The parsing will be incremental and local (within a window of 3-5 words). The local decision making will be based on preferences learned from dependency banks. Several alternative parsing models will be implemented and tested, that in spirit are influenced by the transitionbased approaches to dependency parsing.
The grammar development will differentiate between a rather strict and tightly regimented core grammar that can be employed as the starting point for many applications and robust application specific extensions of this core grammar.
The new approach will be tested in two applications: (i) a diagnostic grammar checker for exercises and exams in computer assisted language learning and (ii) information extraction of complex relations including events and opinions.