Proceedings of the Eighth Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-13) August 8-9 Sofia Bulgaria Pages 329-336 Association for Computational Linguistics 8/2013.
This paper describes a set of experiments on two sub-tasks of Quality Estimation of Machine Translation (MT) output. Sentence-level ranking of alternative MT outputs is done with pairwise classifiers using Logistic Regression with black-box features originating from PCFG Parsing, language models and various counts. Post-editing time prediction uses regression models, additionally fed with new elaborate features from the Statistical MT
decoding process. These seem to be better indicators of post-editing time than black-box features. Prior to training the models, feature scoring with ReliefF and Information Gain is used to choose feature sets of decent size and avoid computational complexity.