DFKI-LT - A Hybrid Machine Learning Approach to Information Extraction

GŁnter Neumann
A Hybrid Machine Learning Approach to Information Extraction
1 In abstract booklet of the 29th Annual Conference of the German Classification Society (GfKl 2005) - Special Track on Text Mining., GfKL, GfKL, 3/2005
 
We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modelling (MEM), and a classifier based on our work on Data-Oriented Parsing DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm on the tagging results of all active mappings. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.18% F-measure using the hybrid approach, compared to 79.27% for MEM and 51.85% for DOP when running them in isolation.
 
Files: BibTeX, GN-GFKL2005.pdf