Extending dependency treebanks with good sentences

Alexander Volokh, Günter Neumann

In: Jeremy Jancsary (editor). KONVENS-2012. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS-12) 11th September 19-August 21 Vienna Austria Eigenverlag ÖGAI 2012.


For many resource-poor languages additional annotated data would be beneficial. However, annotation process is tedious and expensive. We propose a metric for selecting the most promising sentences for annotation. Annotating only good sentences saves time and would allow better results to be achieved even with a smaller amount of annotated data. We demonstrate how our method works on the example of parsing Finnish dependency treebank with MaltParser.


30_volokh12p.pdf (pdf, 186 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz