DFKI-LT - DFKI System Combination with Sentence Ranking at ML4HMT-2011
DFKI System Combination with Sentence Ranking at ML4HMT-2011
1 Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimising the Division of Labour in Hybrid Machine Translation (M, Barcelona, Spain, Center for Language and Speech Technologies and Applications (TALP), Technical University of Catalonia, 2011
We present a pilot study on a Hybrid Machine Translation system that takes advantage of multilateral system-specific meta-data provided as part of the shared task. The proposed solution offers a machine learning approach, resulting into a selection mechanism able to learn and rank system outputs on the sentence level, based on their quality. For training, due to the lack of human annotations, word-level Levenshtein distance has been used as a quality indicator, whereas a rich set of sentence features was extracted and selected from the dataset. Three classification algorithms (Naive Bayes, SVM and Linear Regression) were trained and tested on pairwise featured sentence comparisons. The approaches yielded high correlation with original rankings (tau=0.52) and selected the best translation in 54% of the cases.
Files: BibTeX, document.pdf