Publikation

Automatic Detection and Correction of Errors in Dependency Treebanks

Alexander Volokh, Günter Neumann

In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT-2011) June 19-24 Portland Oregon United States Association for Computational Linguistics 2011.

Abstrakt

Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable amount of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our approach is able to find only a portion of the errors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high precision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that the errors that we are able to detect are mostly different and that our approaches are complementary.

Projekte

treebankErrors.pdf (pdf, 149 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence