DFKI-LT - Paraphrase Fragment Extraction from Monolingual Comparable Corpora
Paraphrase Fragment Extraction from Monolingual Comparable Corpora
1 Proceddings of the ACL Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, Association for Computational Linguistics, 6/2011
We present a novel paraphrase fragment pair extraction method that uses a monolingual comparable corpus containing different articles about the same topics or events. The procedure consists of document pair extraction, sentence pair extraction, and fragment pair extraction. At each stage, we evaluate the intermediate results manually, and tune the later stages accordingly. With this minimally supervised approach, we achieve 62% of accuracy on the paraphrase fragment pairs we collected and 67% extracted from the MSR corpus. The results look promising, given the minimal supervision of the approach, which can be further scaled up.
Files: BibTeX, BUCC2011.pdf