English-Oromo Machine Translation: An Experiment Using a Statistical Approach

Sisay Adugna; Andreas Eisele

In: Mike Rosner Daniel Tapias (Hrsg.). Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10). International Conference on Language Resources and Evaluation (LREC-2010), May 19-21, La Valletta, Malta, Pages 2196-2199, ISBN 2-9517408-6-7, European Language Resources Association (ELRA), 5/2010.


This paper deals with translation of English documents to Oromo using statistical methods. Whereas English is the lingua franca of online information, Oromo, despite its relative wide distribution within Ethiopia and neighbouring countries like Kenya and Somalia, is one of the most resource scarce languages. The paper has two main goals: one is to test how far we can go with the available limited parallel corpus for the English - Oromo language pair and the applicability of existing Statistical Machine Translation (SMT) systems on this language pair. The second goal is to analyze the output of the system with the objective of identifying the challenges that need to be tackled. Since the language is resource scarce as mentioned above, we cannot get as many parallel documents as we want for the experiment. However, using a limited corpus of 20,000 bilingual sentences and 62, 300 monolingual sentences, translation accuracy in terms of BLEU Score of 17.74% was achieved.

683_Paper.pdf (pdf, 406 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz