A Hybrid Machine Learning Approach for Information Extraction from Free Texts.

Günter Neumann

In: M. Spiliopoulou , R. Kruse , C. Borgelt , A. Nürnberger , W. Gaul (Hrsg.). From Data and Information Analysis to Knowledge Engineering. Seiten 390-397 Studies in Classification, Data Analysis, and Knowledge Organization ISBN 3-540-31313-3 Springer-Verlag Berlin, Heidelber, New-York 2006.


We present a hybrid machine learning approach for information extraction from unstructured documents by integrating a learned classifier based on the Maximum Entropy Modeling (MEM), and a classifier based on our work on Data-Oriented Parsing (DOP). The hybrid behavior is achieved through a voting mechanism applied by an iterative tag-insertion algorithm. We have tested the method on a corpus of German newspaper articles about company turnover, and achieved 85.2% F-measure using the hybrid approach, compared to 79.3% for MEM and 51.9% for DOP when running them in isolation.

GN-GfKL005-final.pdf (pdf, 197 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence