DFKI-LT - Hybrid Parallel Sentence Mining from Comparable Corpora

Sabine Hunsicker, Radu Ion, Dan Stefanescu
Hybrid Parallel Sentence Mining from Comparable Corpora
1 Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, o.A., 2012
 
This paper presents a fast and accurate parallel sentence mining algorithm for comparable corpora called LEXACC based on the Cross-Language Information Retrieval framework combined with a trainable translation similarity measure that detects pairs of parallel and quasi-parallel sentences. LEXACC obtains state-of-the-art results in comparison with established approaches .
 
Files: BibTeX, 36.pdf, 36.pdf