Skip to main content Skip to main navigation


Multi-Modal Scene Interpretation

Michael Wünstel; Thomas Röfer
In: Sabine Timpf (Hrsg.). KI - Künstliche Intelligenz, German Journal on Artificial Intelligence - Organ des Fachbereiches "Künstliche Intelligenz" der Gesellschaft für Informatik e.V. (KI), Vol. Themenheft Räumliche Mobilität, Pages 69-71, Fachbereich KI der Gesellschaft für Informatik e.V., BöttcherIT Verlag, 2008.


The visionary goal of developing an easy to use service robot implies several key tasks such as speech understanding, object recognition and scene understanding. Besides the more sensor-oriented capabilities such systems need extensive meta knowledge, e.g., about mental representations of spatial relations to match the view between man and machine. Only if all parts fit together an unrestricted man machine communication can be established. Therefore a cognitive system has to address many different parts that have to be integrated, in the technical sense and especially in the cognition models. Especially when connecting a perceptive component with a spatial reasoning component using a speech recognition and synthesis component, the probabilistic area of object recognition has to be coupled with the logical area of formal reasoning. The cognitive vision system ORCC presented here combines diverse recognition strategies that afford an extensive description of an unreserved scene: In a first step the room demarcations and structurally simple objects such as tables are extracted using as well functional as structural properties. Then further objects are segmented based on their position, followed by a structurally more complex and a more shapeoriented recognition step. Then, this spatial information is enriched with colour-based information about the objects. Afterwards, the resulting scene description can be used as an input for a speech-based man-machine dialogue about the objects within in the scene.