Publikation

A Hybrid Machine Learning Approach to Information Extraction

Günter Neumann

In: In abstract booklet of the 29th Annual Conference of the German Classification Society (GfKl 2005) - Special Track on Text Mining.. Annual Conference of the German Classification Society (GfKI) GfKL 3/2005.

Abstrakt

We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modelling (MEM), and a classifier based on our work on Data-Oriented Parsing DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm on the tagging results of all active mappings. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.18% F-measure using the hybrid approach, compared to 79.27% for MEM and 51.85% for DOP when running them in isolation.

GN-GFKL2005.pdf (pdf, 56 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence