How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

Georg Heigold; Stalin Varanasi; Günter Neumann; Josef van Genabith
In: Association for Machine Translation in the Americas (AMTA) 2018. Conference of the Association for Machine Translation in the Americas (AMTA-2018), March 17-21, Boston, Massachusetts, USA, AMTA, 2018.


This paper investigates the robustness of NLP against perturbed word forms. While neural approaches can achieve (almost) human-like accuracy for certain tasks and conditions, they often are sensitive to small changes in the input such as non-canonical input (e.g., typos). Yet both stability and robustness are desired properties in applications involving user-generated content, and all the more so as humans easily cope with such noisy or adversary conditions. In this paper, we study the impact of noisy input. We consider different noise distributions (different density and different types) and mismatched noise distributions for training and test- ing. Moreover, we empirically evaluate the robustness of different models (convolutional neu- ral networks, recurrent neural networks, non-neural models), different basic units (characters, byte pair encoding units, and words), and different NLP tasks (morphological tagging, machine translation). Our experiments confirm that (i) noisy input substantially degrades the output of models trained on clean data, that (ii) training on noisy data can help models achieve perfor- mance on noisy data similar to that of models trained on clean data tested on clean data, that (iii) models trained noisy data can achieve good results on noisy data almost without performance loss on clean data, that (iv) error type mismatches between training and test data can have a greater impact than error density mismatches, that (v) character based approaches are almost always better than byte pair encoding (BPE) approaches with noisy data, that (vi) the choice of neural models (recurrent, convolutional) is not significant, and that (vii) for morphological tagging, under the same data conditions, the neural models outperform a conditional random field (CRF) based model.



Weitere Links