DFKI-LT - Using Discourse Information for Paraphrase Extraction

Michaela Regneri, Rui Wang
Using Discourse Information for Paraphrase Extraction
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Pages 916-927, Jeju Island, Korea, Republic of, Association for Computational Linguistics, 7/2012
 
Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents’ discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boosts the performance of sentence-level paraphrase acquisition, which consequently gives a tremendous advantage for extracting phrase-level paraphrase fragments from matched sentences. Our system beats an informed baseline by a margin of 50%.
 
Files: BibTeX, D12-1084, D12-1084.pdf