The Effect of Error Rate in Artificially Generated Data for Automatic Preposition and Determiner Correction

Fraser Bowen, Jon Dehdari, Josef van Genabith

In: The Third Workshop on Noisy User-generated Text (W-NUT 2017) - Proceedings of the Workshop. Workshop on Noisy User-generated Text (NUT-2017) befindet sich EMNLP 2017 September 7-7 Copenhagen Denmark Seiten 68-76 ISBN 978-1-945626-94-4 Association for Computational Linguistics 9/2017.


In this research we investigate the impact of mismatches in the density and type of error between training and test data on a neural system correcting preposition and determiner errors. We use synthetically produced training data to control error density and type, and “real” error data for testing. Our results show it is possible to combine error types, although prepositions and determiners behave differently in terms of how much error should be artificially introduced into the training data in order to get the best results.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence