Parsing Discourse Structures for Semantic Storytelling: Evaluating an efficient RST Parser

Pia Linscheid, Peter Bourgonje, Georg Rehm

In: Adrian Paschke , Georg Rehm , Jamal Al Qundus , Clemens Neudecker , Lydia Pintscher (Hrsg.). Proceedings of QURATOR 2021 -- Conference on Digital Curation Technologies. Conference on Digital Curation Technologies (QURATOR-2021) February 8-12 Berlin/Virtual Germany CEUR Workshop Proceedings 2/2021.


We explore if an efficient RST parser achieves as good results with a register-balanced data set as with register-specific data of the training and test phase. In this paper, we present the evaluation of an efficient parser that identifies semantic connections between textual elements within English texts based on Rhetorical Structure Theory (RST). The parser was tested on data from the Georgetown University Multilayer Corpus (GUM). Its output was compared to the manual GUM RST annotations serving as gold standard by using the evaluation tool RST-Tace. This investigation is motivated by the underlying question if the parser can be considered for the processing of discourse structure for a broader application scenario we call “semantic storytelling”, which is an approach for processing content to extract and analyse information as well as generate storylines to support knowledge workers such as journalists or scholars. Our results are based on a relatively small amount of data, yet they show a clearly recognizable difference between manual and parsed annotation of discourse structure. This is due to a number of factors, among them the assignment of default labels for ambiguous relations. The results demonstrate a need for improved disambiguation methods when it comes to assigning rhetorical relations.


qurator2021_Parsing_Discourse_Structures_for_Semantic_Storytelling.pdf (pdf, 822 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence