Skip to main content Skip to main navigation

Publication

From Automatic Keyword Detection to Ontology-Based Topic Modeling

Marc Beck; Syed Tahseen Raza Rizvi; Andreas Dengel; Sheraz Ahmed
In: Xiang Bai; Dimosthenis Karatzas; Daniel Lopresti (Hrsg.). IAPR International Workshop on Document Analysis Systems. IAPR International Workshop on Document Analysis Systems (DAS-2020), 14th IAPR International Workshop on Document Analysis Systems, July 26-29, Wuhan, China, Pages 451-465, LNCS, Vol. Lecture Notes in Computer Science book series, No. 12116, ISBN 978-3-030-57058-3, Springer,, Cham, 8/2020.

Abstract

In this paper, we propose a novel, two-staged system, for keyword detection and ontology-driven topic modeling. The first stage specializes in keyword detection in which we introduce a novel graph-based unsupervised approach called Collective Connectivity-Aware Node Weight (CoCoNoW) for detecting keywords from the scientific literature. CoCoNoW builds a connectivity aware graph from a given publication text and eventually assigns weight to the extracted keywords to sort them in order of relevance. The second stage specializes in topic modeling, where a domain ontology serves as an attention-map/context for topic modeling based on the detected keywords. The use of an ontology makes this approach independent of domain and language. CoCoNoW is extensively evaluated on three publicly available datasets Hulth2003, NLM500 and SemEval2010. Analysis of results reveals that CoCoNoW consistently outperforms the state-of-the-art approaches on the respective datasets.