Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS

Dominic Widdows, Stanley Peters, Scott Cederberg, Chiu-Ki Chan, Diana Steffen, Paul Buitelaar

In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine. ACL Workshop on Natural Language Processing in Biomedicine (ACL-03) July 11-11 Sapporo Japan Seiten 9-16 13 ACL Morristown, NJ, USA 7/2003.


This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% precision, 66% coverage for English and 79% precision, 73% coverage for German on evaluation corpora and over 83% coverage over the whole corpus. The success of this technique for German shows that a lexical resource giving relations between concepts used to index an English document collection can be used for high quality disambiguation in another language.

biomed-wsd.pdf (pdf, 127 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence