Empirical studies on language contrast using the English-German comparable and parallel CroCo corpus

Stella Neumann; Silvia Hansen-Schirra; Oliver Culo; Mihaela Vela

In: In Proceedings of the Workshop Building and Using Comparable Corpora held at International Conference on Language Resources and Evaluation (LREC). International Conference on Language Resources and Evaluation (LREC-2008), May 31, Marrakech, Morocco, Pages 47-51, 5/2008.


This paper presents results from empirical studies on language contrasts, translation shifts and translation strategies gained by exploiting the CroCo Corpus. The aim of this paper is to show that the insights from investigating the comparable parts of the corpus can be complemented by additionally exploiting the parallel parts of the corpus using the examples of word order peculiarities and diverging part-of-speech frequencies in English and German. The exploitation of the corpus proceeds in two steps. First, contrastive differences are identified in the comparable parts of the corpus. In the second step, the solutions chosen by human translators to deal with the contrastive differences are identified. These can be used to decide between different possible translation strategies and can serve as templates for translation strategies to be adopted in the development of MT systems.

LREC2008.pdf (pdf, 174 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence