Semantic Analysis of Text Regions Surrounding Images in Web Documents

Thierry Declerck, Manuel Alcántara Plá

In: Proceedings of the International Workshop OntoImage'2006: Language Resources for Content-Based Image Retrieval. International Workshop on Language Resources for Content-Based Image Retrieval (OntoImage) 2006.


In this paper we present some on-going work and ideas on how to relate text-based semantics to images in web documents. We suggest the use of different levels of Natural Language Processing (NLP) to textual documents and speech transcripts associated to images for providing structured linguistic information that can be merged with available domain knowledge in order to generate additional semantic metadata for the images. An issue to be specifically addressed in the next future concerns the automation of the detection of relevant text/speech transcripts for a certain image (or video sequence). Beyond the time code approach, with its shortcomings, we expect from the discussion in this workshop on lexical characteristics of the language that can or should be used to describe image content an improvement of the approaches we are dealing with for the time being.

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz