Ensemble-style Self-training on Citation Classification

Cailing Dong, Ulrich Schäfer

In: Proceedings of the 5th International Joint Conference on Natural Language Processing. International Joint Conference on Natural Language Processing (IJCNLP-2011) November 8-13 Chiang Mai Thailand Seiten 623-631 ISBN 978-974-466-564-5 Association for Computational Linguistics 11/2011.


Classification of citations into categories such as use, refutation, comparison etc. may have several relevant applications for digital libraries such as paper browsing aids, reading recommendations, qualified citation indexing, or fine-grained impact factor calculation. Most citation classification approaches described so far heavily rely on rule systems and patterns tailored to specific science domains. We focus on a less manual approach by learning domain-insensitive features from textual, physical, and syntactic aspects. Our experiments show the effectiveness of this feature set with various machine learning algorithms on datasets of different sizes. Furthermore, we build an ensemble style self-training classification model and get better classification performance using only few training data, which largely reduces the manual annotation work in this task.


Weitere Links

I11-1070.pdf (pdf, 218 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence