Skip to main content Skip to main navigation


Cross-Lingual Information Retrieval through Semantic Annotation

Spela Vintar; Paul Buitelaar; Bärbel Ripplinger
In: Proceedings of the Workshop on Natural Language Processing in Biomedical Applications. Workshop on Natural Language Processing in Biomedical Applications (NLPBA-02), March 8-9, Nicosia, Cyprus, 2002.


We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes POS-tagging, morphological analysis, phrase recognition and the identification of medical concepts and semantic relations between them. The paper describes experiments in mono- and bilingual document retrieval, performed on a parallel English-German corpus of medical abstracts. Results show on the one hand that linguistic processing, especially lemmatisation and compound analysis, is a crucial step to achieving good baseline performance. On the other hand we show that semantic information, specifically the combined use of concepts and relations, significantly increases performance in cross-lingual retrieval.