DFKI-LT - Using Discourse Information for Paraphrase Extraction
Using Discourse Information for Paraphrase Extraction
1 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,
Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boosts the performance of sentence-level paraphrase acquisition, which consequently gives a tremendous advantage for extracting phrase-level paraphrase fragments from matched sentences. Our system beats an informed baseline by a margin of 50%.
Files: BibTeX, D12-1084, D12-1084.pdf