Interactively Constructing Knowledge Graphs from Messy User-Generated Spreadsheets

Markus Schröder; Christian Jilek; Michael Schulze; Andreas Dengel

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2103.03537, Pages 1-15, arXiv, 3/2021.


When spreadsheets are filled freely by knowledge workers, they can contain rather unstructured content. For humans and especially machines it becomes difficult to interpret such data properly. Therefore, spreadsheets are often converted to a more explicit, formal and structured form, for example, to a knowledge graph. However, if a data maintenance strategy has been missing and user-generated data becomes "messy", the construction of knowledge graphs will be a challenging task. In this paper, we catalog several of those challenges and propose an interactive approach to solve them. Our approach includes a graphical user interface which enables knowledge engineers to bulk-annotate spreadsheet cells with extracted information. Based on the cells' annotations a knowledge graph is ultimately formed. Using five spreadsheets from an industrial scenario, we built a 25k-triple graph during our evaluation. We compared our method with the state-of-the-art RDF Mapping Language (RML) attempt. The comparison highlights contributions of our approach.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence