Skip to main content Skip to main navigation


Building A German Clinical Named Entity Recognition System without In-domain Training Data

Siting Liang; Hans-Jürgen Profitlich; Maximilian Klass; Niko Möller-Grell; Celine-Fabienne Bergmann; Simon Heim; Christian Niklas; Daniel Sonntag
In: Association for Computational Linguistics. Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2024), June 17-21, Mexico City, Mexico, ACL Anthology, 2024.


Clinical Named Entity Recognition (NER) is essential for extracting important medical insights from clinical narratives. Given the challenges in obtaining expert training datasets for real-world clinical applications related to data protection regulations and the lack of standardised entity types, this work represents a collaborative initiative aimed at building a German clinical NER system with a focus on addressing these obstacles effectively. In response to the challenge of training data scarcity, we propose a \textbf{Conditional Relevance Learning (CRL)} approach in low-resource transfer learning scenarios. \textbf{CRL} effectively leverages a pre-trained language model and domain-specific open resources, enabling the acquisition of a robust base model tailored for clinical NER tasks, particularly in the face of changing label sets. This flexibility empowers the implementation of a \textbf{Multilayered Semantic Annotation (MSA)} schema in our NER system, capable of organizing a diverse array of entity types, thus significantly boosting the NER system's adaptability and utility across various clinical domains. In the case study, we demonstrate how our NER system can be applied to overcome resource constraints and comply with data privacy regulations. Lacking prior training on in-domain data, feedback from expert users in respective domains is essential in identifying areas for system refinement. Future work will focus on the integration of expert feedback to improve system performance in specific clinical contexts.