Efficient Integrated Tagging of Word Constructs

Andrew Bredenkamp, Thierry Declerck, Frederik Fouvry, Bradley Music

In: Proceedings of the 16th International Conference on Computational Linguistics. International Conference on Computational Linguistics (COLING-96) August 5-9 Copenhagen Denmark Seiten 1028-1031 Morgan Kaufmann Publishers 1996.


We describe a robust texthandling component, which can deal with free text in a wide range of formats and can successfully identify a wide range of phenomena, including chemical formulae, dates, numbers and proper nouns. The set of regular expressions used to capture numbers in written form ("sechsundzwanzig") in German is given as an example. Proper noun "candidates" are identified by means of regular expressions, these being then rejected or accepted on the basis of runtime interaction with the user. This tagging component is integrated in a largescale grammar development environment, and provides direct input to the grammatical analysis component of the system by means of "lift" rules which convert tagged text into partial linguistic structures.

Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence