DFKI-LT - Ensemble-style Self-training on Citation Classification

Cailing Dong, Ulrich Schäfer
Ensemble-style Self-training on Citation Classification
2 Proceedings of the 5th International Joint Conference on Natural Language Processing, Pages 623-631, Chiang Mai, Thailand, Association for Computational Linguistics, 11/2011
 
Classification of citations into categories such as use, refutation, comparison etc. may have several relevant applications for digital libraries such as paper browsing aids, reading recommendations, qualified citation indexing, or fine-grained impact factor calculation. Most citation classification approaches described so far heavily rely on rule systems and patterns tailored to specific science domains. We focus on a less manual approach by learning domain-insensitive features from textual, physical, and syntactic aspects. Our experiments show the effectiveness of this feature set with various machine learning algorithms on datasets of different sizes. Furthermore, we build an ensemble style self-training classification model and get better classification performance using only few training data, which largely reduces the manual annotation work in this task.
 
Files: BibTeX, I11-1070.pdf, I11-1070, I11-1070