DFKI-LT - Hybrid Parallel Sentence Mining from Comparable Corpora

Sabine Hunsicker, Radu Ion, Dan Stefanescu
Hybrid Parallel Sentence Mining from Comparable Corpora
Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, o.A., 2012
 
This paper presents a fast and accurate parallel sentence mining algorithm for comparable corpora called LEXACC based on the Cross-Language Information Retrieval framework combined with a trainable translation similarity measure that detects pairs of parallel and quasi-parallel sentences. LEXACC obtains state-of-the-art results in comparison with established approaches .
 
Files: BibTeX, 36.pdf, 36.pdf