Skip to main content Skip to main navigation


A Hybrid Machine Learning Approach to Information Extraction

Günter Neumann
In: In abstract booklet of the 29th Annual Conference of the German Classification Society (GfKl 2005) - Special Track on Text Mining. Annual Conference of the German Classification Society (GfKI), GfKL, 3/2005.


We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modelling (MEM), and a classifier based on our work on Data-Oriented Parsing DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm on the tagging results of all active mappings. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.18% F-measure using the hybrid approach, compared to 79.27% for MEM and 51.85% for DOP when running them in isolation.