Publication

Hybrid Parallel Sentence Mining from Comparable Corpora

Sabine Hunsicker, Radu Ion, Dan Stefanescu

In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation. Annual Conference of the European Association for Machine Translation (EAMT-12) May 28-30 Trento Italy 2012.

Abstract

This paper presents a fast and accurate parallel sentence mining algorithm for comparable corpora called LEXACC based on the Cross-Language Information Retrieval framework combined with a trainable translation similarity measure that detects pairs of parallel and quasi-parallel sentences. LEXACC obtains state-of-the-art results in comparison with established approaches .

Projekte

Weitere Links

36.pdf (pdf, 484 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz