DFKI-LT - Generating Virtual Parallel Corpus: A Compatibility Centric Method
Generating Virtual Parallel Corpus: A Compatibility Centric Method
2 MT Summit XIII, Xiaman, China, NA, Xiamen, 9/2011
The processing of many natural languages suffers from scarce linguistic resources. We introduce the idea of compatibility to extend training data for machine translation: If translation hypotheses by multiple systems are measured as compatible, they are considered as reliable predictions. By this way, we generate virtual parallel data per bridge language, and re-compiling on this corpus improves our machine translation quality by more than 30% relatively.
Files: BibTeX, mtsummit13_submission_13.pdf