Curation Technologies for Cultural Heritage Archives: Analysing and transforming a heterogeneous data set into an interactive curation workbench

Georg Rehm; Martin Lee; Julian Moreno Schneider; Peter Bourgonje

In: 3rd International Conference on Digital Access to Textual Cultural Heritage (DATeCH2019). Digital Access to Textual Cultural Heritage (DATeCH-2019), May 9-10, Brussels, Belgium, ACM, 2019.


We present a platform that enables the semantic analysis, enrichment, visualisation and presentation of a document collection in a way that enables human users to intuitively interact and explore the collection, in short, a curation platform or workbench. The data set used is the result of a research project, carried out by scholars from South Korea, in which official German government documents on the German re-unification were collected, intellectually curated, analysed, interpreted and published in multiple volumes. The documents we worked with are mostly in German, a small subset, mostly summaries, is in Korean. This paper describes the original research project that generated the data set and focuses upon a description of the platform and Natural Language Processing (NLP) pipeline adapted and extended for this project (e. g., OCR was added). Our key objective is to develop an interactive curation workbench that enables users to interact with the data set in several different ways that go beyond the current version of the published document collection as a set of PDF documents that are available online. The paper concludes with suggestions regarding the improvement of the platform and future work.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence