Skip to main content Skip to main navigation


ADA: Automatic Data Annotation for Data Ecosystems

Natalie Gdanitz; Sabine Janzen; Hannah Stein; Amin Harig; Wolfgang Maaß
In: Irini Fundulaki; Kouji Kozaki; Jose Manuel Gomez-Perez; Daniel Garijo (Hrsg.). Proceedings of the ISWC 2023 Posters, Demos and Industry Tracks. International Semantic Web Conference (ISWC-2023), located at 22nd International Semantic Web Conference, November 6-10, Athens, Greece, Springer, 11/2023.


Data ecosystems have emerged as versatile platforms for managing and analyzing data from diverse sources, facilitating integration, collaboration and governance across organizations and systems. Annotated data are crucial for efficient and effective large-scale data ecosystems. However, there is a lack of full-fledged automatic annotation approaches for data ecosystems, with manual annotation by experts being the current requirement. Addressing specific annotation requirements of data ecosystems, we introduce ADA, an approach for automatic data annotation. ADA applies a semantic representation model called Data Product Description Object (DPDO) in JSON-LD and combines state-of-the-art models for metadata embeddings within an annotation pipeline. The approach extends technical metadata by essential concepts for data ecosystems, such as data provenance, quality, and accessibility. The effectiveness of ADA was evaluated using competency questions and data sets from diverse domains within the GAIA-X data ecosystem.