Publikation

Information Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies

Hans-Ulrich Krieger; Christian Spurk; Hans Uszkoreit; Feiyu Xu; Yi Zhang; Frank Müller; Thomas Tolxdorff

In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014). International Conference on Language Resources and Evaluation (LREC-2014), European Language Resources Association, 2014.

Zusammenfassung

In this paper, we report on first attempts and findings to analyzing German patient records, using a hybrid parsing architecture and a combination of two relation extraction strategies. On a practical level, we are interested in the extraction of concepts and relations among those concepts, a necessary cornerstone for building medical information systems. The parsing pipeline consists of a morphological analyzer, a robust chunk parser adapted to Latin phrases used in medical diagnosis, a repair rule stage, and a probabilistic context-free parser that respects the output from the chunker. The relation extraction stage is a combination of two systems: SProUT, a shallow processor which uses hand-written rules to discover relation instances from local text units and DARE which extracts relation instances from complete sentences, using rules that are learned in a bootstrapping process, starting with semantic seeds. Two small experiments have been carried out for the parsing pipeline and the relation extraction stage.

Projekte

MEDIXIN - Medizinische Informationsextraktion und Fragebeantwortung für Informationsdienste im Bereich Gesundheit

abparser.pdf (pdf, 176 KB )