iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured text

Benjamin Adrian, Jörn Hees, Ludger van Elst, Andreas Dengel

In: Bärbel Mersching , Marcus Hund , Zaheer Aziz (editor). KI 2009: Advances in Artificial Intelligence. German Conference on Artificial Intelligence (KI-2009) September 15-18 Paderborn Germany Pages 249-256 Lecture Notes in Artificial Intelligence (LNAI) 5803 ISBN 978-3-642-04616-2 Springer-Verlag Heidelberg 9/2009.


Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument's ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.


German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz