Bootstrapping a Domain-specific Terminological Taxonomy from Scientific Text

Magdalena Wolska, Ulrich Schäfer, The Nghia Pham

In: Proceedings of the 9th International Conference on Terminology and Artificial Intelligence (TIA). International Conference on Terminology and Artificial Intelligence (TIA-2011) November 8-10 Paris France Seiten 17-23 INALCO 11/2011.


We present an approach to automated extraction of a taxonomy of domain-specific terms from scientific discourse. The approach has been developed and evaluated in the domain of computational linguistics. Concept pairs in is-a relation have been extracted from a subset of the ACL Anthology and WeScience. Correctness of the resource has been verified by crowdsourcing: To attract domain experts to identify correct and invalid is-a pairs, we used ``games with a purpose''. The popular games of Tetris and Invaders were modified to support concurrent and efficient annotation of domain term pairs during playing. High quality of the resulting annotations was ensured by exploiting redundancy: at least five-way agreement was required for a candidate is-a pair to be considered correctly extracted. Based on the crowdsourced evaluation the extraction method achieved precision around 80%.


Weitere Links

TIA05.pdf (pdf, 871 KB)

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence