Eva Hajičová (editor).
The Prague Bulletin of Mathematical Linguistics (PBML) 100 Pages 63-72 Charles University in Prague Prague, Czech Republic 9/2013.
Recent research and applications for evaluation and quality estimation of Machine Translation require statistical measures for comparing machine-predicted ranking against gold sets annotated by humans. Additional to the existing practice of measuring segment-level correlation with Kendall tau, we propose using ranking metrics from the research field of Information Retrieval such as Mean Reciprocal Rank, Normalized Discounted Cumulative Gain and Expected Reciprocal Rank. These reward systems that predict correctly the highest ranked items than the one of lower ones. We present an open source tool providing implementation of these metrics. It can be either run independently as a script supporting common formats or can be imported to any Python application.