Skip to main content Skip to main navigation


iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured text

Benjamin Adrian; Jörn Hees; Ludger van Elst; Andreas Dengel
In: Bärbel Mersching; Marcus Hund; Zaheer Aziz (Hrsg.). KI 2009: Advances in Artificial Intelligence. German Conference on Artificial Intelligence (KI-2009), September 15-18, Paderborn, Germany, Pages 249-256, Lecture Notes in Artificial Intelligence (LNAI), Vol. 5803, ISBN 978-3-642-04616-2, Springer-Verlag, Heidelberg, 9/2009.


Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument's ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.