Publikation

MLT-DFKI at CLEF eHealth 2019: Multi-label Classification of ICD-10 Codes with BERT

Saadullah Amin, Günter Neumann, Katherine Dunfield, Anna Vechkaeva, Kathryn Annette Chapman, Morgan Kelly Wixted

In: CLEF 2019 Working Notes. Conference and Labs of the Evaluation Forum (CLEF-2019) 10th Conference and Labs of the Evaluation Forum September 9-12 Lugano Switzerland CEUR-WS.org 9/2019.

Abstrakt

With the adoption of electronic health record (EHR) systems, hospitals and clinical institutes have access to large amounts of heterogeneous patient data. Such data consists of structured (insurance details, billing data, lab results etc.) and unstructured (doctor notes, admission and discharge details, medication steps etc.) documents, of which, latter is of great significance to apply natural language processing (NLP) techniques. In parallel, recent advancements in transfer learning for NLP has pushed the state-of-the-art to new limits on many language understanding tasks. Therefore, in this paper, we present team DFKI-MLT's participation at CLEF eHealth 2019 Task 1 of automatically assigning ICD-10 codes to non-technical summaries (NTSs) of animal experiments where we use various architectures in multi-label classification setting and demonstrate the effectiveness of transfer learning with pre-trained language representation model BERT (Bidirectional Encoder Representations from Transformers) and its recent variant BioBERT. We first translate task documents from German to English using automatic translation system and then use BioBERT which achieves an F1-micro of 73.02% on submitted run as evaluated by the challenge.

Projekte

Weitere Links

CLEF_2019_paper_67.pdf (pdf, 380 KB)

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence