Publikation

Multi-lingual ICD-10 Coding using a Hybrid rule-based and Supervised Classification Approach at CLEF eHealth 2017

Jurica Seva; Madeleine Kittner; Roland Roller; Ulf Leser

In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum. Conference and Labs of the Evaluation Forum (CLEF-2017), Information Access Evaluation meets Multilinguality, Multimodality, and Visualization, September 11-14, Dublin, Ireland, CEUR Workshop Proceedings, Vol. 1866, CEUR-WS.org, 2017.

Zusammenfassung

In this paper we present our research efforts and obtained results within the CLEF eHealth challenge 2017, Track 1. The task involves the recognition and mapping of ICD-10 codes to English and French death certificates. Our approach proposes a two tier, two stage process. First, we use a rule-based system, based on handcrafted rules and the use of Apache Solr, to perform ICD-10 code Named Entity Recognition (NER). This step produces a set of possible candidates extracted from the input text. Next, we use tf-idf weighted character n-gram classification models to normalize and rank a previously generated ICD-10 candidate set. Classification models used are generated and follow the hierarchical structure of the given ICD-10 dictionaries, by creating individual classification models for the first two hierarchical levels (chapters and blocks). Finally, the top candidate, generated from the overlap between the list of possible ICD-10 code candidates (input list) and ranked list of final ICD-10 candidates (output list), is taken as the final ICD-10 code. Although the ICD-10 candidate NER is language-dependent, the normalization and ranking of candidates utilizes a language independent approach.

Weitere Links

http://ceur-ws.org/Vol-1866/paper_70.pdf