DFKI-LT - Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction
Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction
3 14th International Conference on Applications of Natural Language to Information Systems, Saarbrücken, Germany, Springer, 2009
The main contribution of this paper is a systematic analysis of a minimally supervised machine learning method for relation extraction grammars. The method is based on a bootstrapping approach in which the bootstrapping is triggered by semantic seeds. The starting point of our analysis is the pattern-learning graph which is a subgraph of the bipartite graph representing all connections between linguistic patterns and relation instances exhibited by the data. It is shown that the performance of such general learning framework for actual tasks is dependent on certain properties of the data and on the selection of seeds. Several experiments have been conducted to gain explanatory insights into the interaction of these two factors. From the investigation of more effective seeds and benevolent data we understand how to improve the learning in less fortunate configurations. A relation extraction method only based on positive examples cannot avoid all false positives, especially when the data properties yield a high recall. Therefore, negative seeds are employed to learn negative patterns, which boost precision.
Files: BibTeX, paper.pdf