Information Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies

Hans-Ulrich Krieger, Christian Spurk, Hans Uszkoreit, Feiyu Xu, Yi Zhang, Frank Müller, Thomas Tolxdorff

In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014). International Conference on Language Resources and Evaluation (LREC-2014) European Language Resources Association 2014.


In this paper, we report on first attempts and findings to analyzing German patient records, using a hybrid parsing architecture and a combination of two relation extraction strategies. On a practical level, we are interested in the extraction of concepts and relations among those concepts, a necessary cornerstone for building medical information systems. The parsing pipeline consists of a morphological analyzer, a robust chunk parser adapted to Latin phrases used in medical diagnosis, a repair rule stage, and a probabilistic context-free parser that respects the output from the chunker. The relation extraction stage is a combination of two systems: SProUT, a shallow processor which uses hand-written rules to discover relation instances from local text units and DARE which extracts relation instances from complete sentences, using rules that are learned in a bootstrapping process, starting with semantic seeds. Two small experiments have been carried out for the parsing pipeline and the relation extraction stage.


abparser.pdf (pdf, 176 KB)

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence