DFKI-LT - An Interface between Text Structures and Linguistic Descriptions

Thierry Declerck
An Interface between Text Structures and Linguistic Descriptions
in: Ellen Christoffersen, Bradley Music (eds.):
1 Proceedings of the Datalingvistisk Forening (DALF'97), June 9-10, Pages 8-22, Kolding, Denmark, o.A., 1997
 
This paper describes various uses of the Text Handling to Linguistic Structure (TH-LS) component of the Advanced Linguistic Engineering Platform (ALEP). Basically, the Text Handling (TH) subsystem of ALEP performs a conversion of the input string to a SGML text. The TH-LS component consists in a set of so-called 'tsls' rules defining a mapping between textual structures (TS) and (partial) linguistic descriptions (PLS). The instantiated PLS are the input for the linguistic parser, dealing with linguistic structures (LS). I show how an adequate use of the TH-LS interface permits the modularization of the lingware and the definition of subgrammars, from morpheme level to the whole text, taking into consideration both processing steps and levels of grammar descritpion. The TH subsystem of ALEP also foresees the +tag for user-supplied markup. An intensive use of this possibility and also the integration of information delivered by a PoS tagger into the TH component allowed both a substantial extension of the coverage and a significant improvement of the efficiency of the ALEP-based grammars, the parser getting as an input linguistically enriched PLS.
 
Files: BibTeX, declerck97_dalf.ps.gz, Declerck:1997:IBT.pdf