DFKI-LT - Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web
Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web
4 Proceedings of the 11th International Semantic Web Conference, Boston, Masachusetts, USA, Springer, 11/2012
We present a large-scale domain-adaptive relation extraction (RE) system, which learns grammar-based RE rules from the Web by utilizing large numbers of known relation instances as seed. The system does not only detect binary but also nary relations such as events. Our goal is to discover rule sets large enough for the actual range of linguistic variation, thus solving the notorious long-tail problem of real-world applications for the Semantic Web. The system utilizes distant supervision by taking Freebase as seed and the web as learning corpus. By a novel variant of distant supervision many relations are learned in parallel, which enables a new method of rule filtering. In the experiments, 39 semantic relations are targeted with 2.8m seed instances extracted from Freebase. 3m sentences extracted from 20m web pages serve as the basis for learning an average of 40k distinctive rules for each relation. Given an efficient dependency parser, the average running time for each relation takes only 19 hours. Evaluation on the ACE '05 data and a specially annotated corpus shows high recall. A comparison with a baseline system learning from a smaller corpus shows that even with bootstrapping and with the same massive seed, the recall of Web based learning cannot be matched. Rule filtering effectively improves precision.
Files: BibTeX, distantly_supervised_dare.pdf