DFKI-LT - Bottom-Up Earley Deduction for Preference-Driven Natural Language Processing
Bottom-Up Earley Deduction for Preference-Driven Natural Language Processing
1 PHD-Thesis, Universität des Saarlandes, Department of Computational Linguistics, Saarbrücken, 1997
The thesis discusses the processing of principle-based grammars such as HPSG, with an application to best-first processing for disambiguation, selection of paraphrases and possibly also handling ill-formed input. Grammars are treated as a definite clause programs, to which program transformation and deduction techniques can be applied. In particular, the view of constraint logic programming is adopted, which abstracts away from the handling of constraints (which is regarded as a service provided by the constraint solver) and concentrates on resolution strategies. The contributions of the thesis are the following: A constraint language supporting sorted feature terms, Prolog terms, multi-dimensional inheritance, finite domains, by compilation to Prolog terms. The constraint language has been extended with external constraint solvers for set constraints, LP constraints, guarded constraints. A partial deduction system for compiling a principle-based grammar into a set of grammar rules. Partial deduction can be applied selectively through control information in the program, which makes it possible to do experimental work in order to determine the selection of goals to which partial deduction is best applied in order to bring the greatest performance improvement. Bottom-Up Earley deduction as a deduction system which generalises bottom-up chart parsing to a wider class of grammars/programs than just those with a context-free backbone. Bottom-up Earley deduction is better suited for handling discontinuous constituency than its top-down counterpart, it allows best-first search based on bottom-up information (e.g. tag probabilities), and provides different indexing schemes for different modes of combination of items. The correctness, completeness and termination properties of the algorithm have been shown. A generalised linguistic deduction system, which allows combination of different deduction strategies via control annotations, and which is tightly integrated with the underlying programming language in order to achieve efficiency. A fully incremental algorithm for bottom-up Earley deduction has been specified, which can cope efficiently with a change in the query by re-using intermediate deduction results as much as possible. Augmentation of a definite clause language with preference values. We have discussed how preference values from a variety of sources can be combined.
Files: BibTeX, Erbach:1997:BED.pdf