From Automatic Keyword Detection to Ontology-Based Topic Modeling

Marc Beck, Syed Tahseen Raza Rizvi, Andreas Dengel, Sheraz Ahmed

In: Xiang Bai, Dimosthenis Karatzas, Daniel Lopresti (editor). IAPR International Workshop on Document Analysis Systems. IAPR International Workshop on Document Analysis Systems (DAS-2020) 14th IAPR International Workshop on Document Analysis Systems July 26-29 Wuhan China Pages 451-465 LNCS Lecture Notes in Computer Science book series 12116 ISBN 978-3-030-57058-3 Springer, Cham 8/2020.


In this paper, we propose a novel, two-staged system, for keyword detection and ontology-driven topic modeling. The first stage specializes in keyword detection in which we introduce a novel graph-based unsupervised approach called Collective Connectivity-Aware Node Weight (CoCoNoW) for detecting keywords from the scientific literature. CoCoNoW builds a connectivity aware graph from a given publication text and eventually assigns weight to the extracted keywords to sort them in order of relevance. The second stage specializes in topic modeling, where a domain ontology serves as an attention-map/context for topic modeling based on the detected keywords. The use of an ontology makes this approach independent of domain and language. CoCoNoW is extensively evaluated on three publicly available datasets Hulth2003, NLM500 and SemEval2010. Analysis of results reveals that CoCoNoW consistently outperforms the state-of-the-art approaches on the respective datasets.

Automatic_Keyword_Detection.pdf (pdf, 444 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz