Sabine Lehmann: Towards a Theory of Syntactic Phenomena

This thesis is the product of a long process of evolution stretching back well before the thesis work began, its inception lay in the realisation that, for certain (or perhaps all) practical language engineering tasks, fundamental problems of definition remained to be solved. The most basic problem concerned the definition of the ubiquitous concept of a linguistic phenomenon.

The first ideas in this regard were developed during work on large-scale test suites for different languages. Test suite coverage was invariably defined in terms of sets of linguistic phenomena, and the description of the data was inherently phenomena-oriented. "Phenomena-orientation" is in fact key to NLP evaluation. However, while linguistics and NLP design typically describe their data in terms of syntactic phenomena, it is striking to note that NLP applications are structured along radically different lines. Phenomena have no ontological status in the grammars the applications use, and there is no simple mapping between phrase structure or constituency based analyses and the phenomena that the grammar as a whole should model.

Clearly this represents a serious problem when it comes to evaluating or improving these applications, since the evaluation proceeds along a different dimension from that by which the grammar was constructed. In engineering terms, these competing dimensions of description lead to practical problems for grammar developers working on large-scale grammars. Changing or adding rules to grammars may have unpredictable results elsewhere in the grammar - this is due to the fact that a rule corresponds to the realisation of a complex set of interacting phenomena.

Initially then this thesis aims to overcome this discrepancy by proposing the construction of grammars partitioned according to syntactic phenomena: the main idea is to view grammar engineering from a different perspective, namely where at the level of the formal implementation, grammars can be seen as systems of interacting phenomena. We believe that such an approach would be particularly useful in two main ways: firstly it would facilitate distributed grammar engineering, since linguists could access all and only the information they needed to work on a particular phenomenon, and secondly it would make clear how and where phenomena were encoded making reusability more straightforward. Clearly, since this whole enterprise is based on the concept of syntactic phenomena, its feasibility depends on a proper definition of that notion. Furthermore a careful classification of phenomena gives a better understanding of how they interact.

Our investigations raise another important issue, namely the relation between phrase structure and phenomena. The notion of phrase structure is intrinsically related to phenomena, since they form the foundation on which phrase structural descriptions of language are built. It is therefore important to investigate the definition of phrase structure from that point of view.