Minimally Supervised Rule Learning for the Extraction of Biographic Information from Various Social Domains

Hong Li; Feiyu Xu; Hans Uszkoreit

In: Proceedings of the International Conference on Recent Advances in Natural Language Processing 2011 (RANLP 2011). International Conference on Recent Advances in Natural Language Processing (RANLP-2011), September 12-14, Hissar, Bulgaria, 2011.


This paper investigates the application of an existing seed-based minimally supervised learning algorithm to different social domains exhibiting different properties of the available data. A systematic analysis studies the respective data properties of the three domains including the distribution of the semantic arguments and their combinations. The experimental results confirm that data properties have a strong influence on the performance of the learning system. The main results are insights about: (i) the effects of data properties such as redundancy and frequency of argument mentions on coverage and precision (ii) the positive effects of negative examples if used effectively (iii) the different effects of negative examples depending on the domain data properties and (iv) the potential of reusing rules from one domain for improving the relation extraction performance in another domain.

ranlp2011_dare_multidomain.pdf (pdf, 347 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence