Skip to main content Skip to main navigation

Publikation

Automatic Detection and Correction of Errors in Dependency Treebanks

Alexander Volokh; Günter Neumann
In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT-2011), June 19-24, Portland, Oregon, USA, Association for Computational Linguistics, 2011.

Zusammenfassung

Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable amount of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our approach is able to find only a portion of the errors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high precision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that the errors that we are able to detect are mostly different and that our approaches are complementary.

Projekte