Enriching BERT with Knowledge Graph Embeddings for Document Classification

Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno Schneider, Georg Rehm, Bela Gipp

In: Proceedings of the GermEval 2019 Workshop. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS-2019) GermEval Workshop located at KONVENS 2019 October 8-8 Nürnberg Germany GermEval [online] 2019.


Several professional job profiles consist of various content curation processes, i.\,e., analysing, structuring, translating and making sense of large and heterogeneous amounts of digital content. Many of these processes are laborious and can be fully or partially automated with the help of language processing services. An essential preprocessing step is the initial classification according to topic, genre and text type, which are, in subsequent phases, used to route a given piece of incoming content into a corresponding services. In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve significant better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1-score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. The source code of our experiments and trained models are publicly available on GitHub.


1909.08402.pdf (pdf, 458 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz