Skip to main content Skip to main navigation

Publication

Cross-& multi-lingual medication detection: a transformer-based analysis

Lisa Raithel; Johann Frei; Philippe Thomas; Roland Roller; Pierre Zweigenbaum; Sebastian Möller; Frank Kramer
In: BMC Medical Informatics and Decision Making (MIDM), Vol. 25, No. 1, Pages 1-13, Springer, 2025.

Abstract

Extracting specific information, such as medication mentions, from large unstructured medical texts can be challenging, especially when no annotated corpus exists in the target language for training. To overcome this, leveraging existing machine learning models and datasets is essential, and since most pre-trained resources are in English, adopting multilingual approaches can help transferring between languages. In this work, we investigate the usage of a multi-lingual transformer model in a multi-lingual and cross-lingual setting to extract drug names from medical texts using named entity recognition in four European languages: German, English, French, and Spanish. We report the scores obtained by cross-lingual transfer with several published datasets after fine-tuning a multi-lingual model, aiming to create empirical evidence on how the transfer of “medical” knowledge between languages can be expected to benefit various language pairs. We further perform a qualitative error analysis and find that the performance on all languages achieves competitive levels. Conversely, erroneous prediction artifacts are introduced by annotation inconsistencies, differences in annotation guidelines and vague entity labels in general.

Projects

More links