DFKI-LT - Dissertation Series

Vol. XXX

Rebecca Dridan: Using lexical statistics to improve HPSG parsing

ISBN: 978-3-933218-29-2
221 pages
price: € 15

order form

In this thesis, we describe research aimed at discovering how lexical statistics could improve the performance of a parser that uses hand-crafted grammars in the HPSG framework. Initially, we explain relevant characteristics of the HPSG formalism and of the PET parser that is used for experimentation. The parser and grammar combination is situated in the broader parser space, and we discuss the respective advantages and disadvantages of deep hand-crafted grammars, as opposed to other systems that are derived from treebanks or that provide a less detailed analysis. We find that many disadvantages of the deep hand-crafted system come from the same sources as the advantages this combination yields. Rather than eliminate the disadvantages by eliminating the advantages, we look to combine advantages from different systems. An overview of hybrid processing examines methods that have worked in other situations, and we then focus on supertagging as one method particularly appropriate for a lexicalised formalism such as HPSG.

Before any changes are made to the parser or grammar, we need to define the evaluation framework by which we’ll judge the various modifications. Parser evaluation is not straightforward. Efficiency, robustness and accuracy are all important facets of parser performance, and the three are often in contention with each other. We briefly outline issues around the first two facets, and then give a detailed overview of how parsing accuracy has been evaluated in the past. Building on the conclusions drawn from this overview, we then define a new granular evaluation metric, Elementary Dependency Match (EDM), that is suitable for evaluating the detailed semantic information that is produced by the PET parser.

Supertags have been defined as lexical descriptions that embody richer information than part-of-speech tags, and in particular include dependency information. This research looks at different forms of lexical description, ranging from a simple 13-tag set of part-of-speech tags that would not generally be considered supertags, up to very fine-grained descriptions that include information about morphology, subcategorisation and selectional preference of prepositions. The questions we ask are:

  1. Which tag forms are predictable from the available training data?
  2. Which tag forms are useful for different aspects of the parsing process?
We attempt to answer the first question with a set of comprehensive experiments comparing the performance of a Hidden Markov Model-based tagger, and a Maximum Entropy Markov Model-based tagger. We vary the amount and source of the training data, and also evaluate against different methods of assigning tags.

To explore the usefulness of the different tag forms, we look at three different aspects of the parsing process where lexical information has been demonstrated to help. One of the main sources of parser failure is unknown words in the input. Hence, we look at using lexical statistics to predict information about an unknown word, in order to boost parser robustness. We use different granularities of tag forms that are based on the distinctions made within the grammar, and compare them to using no external information and to using Penn Treebank-style part-of-speech tags predicted by a tagger trained on one million words. While robustness is the main focus of these experiments, we also evaluate the effects on efficiency and accuracy when using each tag form.

The second aspect of parsing that we consider is the impact of lexical ambiguity on parser efficiency. This has been the focus of previous work involving supertags, and following that work, we use the predicted tags of different forms to restrict the lexical items that are considered in parsing. We first carry out an oracle wxperiment with each tag type, in order to determine the upper bound on performance possible by using this technique. Then, three different methods of lexical restriction are used: first, we allow only lexical items compatible with the single top tag predicted for each token; then we try allowing multiple tags, depending on the probabilities assigned by the taggers to the tags; finally we selectively restrict particular tokens of the input, depending on the probability assigned by the tagger to the top tag.

Finally, lexical statistics have also been shown to be useful for increasing the accuracy of parse ranking. While a full investigation of the possibilities involved in this application of lexical statistics is too large to attempt in this thesis, we discuss some aspects of how this could work and how this use of statistics differs from the previous experiments. We then carry out a preliminary exploration of the data to give some indication of the effect the available lexical statistics could hope to have when properly integrated into the parse ranking statistical model.