Towards Wearable Attention-Aware Systems in Everyday Environments

Takumi Toyama

PhD-Thesis TU Kaiserslautern 11/2015.


Attention-awareness is a key topic for the upcoming generation of computer-human interaction. A human moves his or her eyes to visually attends to a particular region in a scene. Consequently, he or she can process visual information rapidly and effciently without being overwhelmed by vast amount of information from the environment. Such a physiological function called visual attention provides a computer system with valuable information of the user to infer his or her activity and the surrounding environment. For example, a computer can infer whether the user is reading text or not by analyzing his or her eye movements. Furthermore, it can infer with which object he or she is interacting by recognizing the object the user is looking at. Recent developments of mobile eye tracking technologies enable us to capture human visual attention in ubiquitous everyday environments. There are various types of applications where attention-aware systems may be effectively incorporated. Typical examples are augmented reality (AR) applications such as Wikitude which overlay virtual information onto physical objects. This type of AR application presents augmentative information of recognized objects to the user. However, if it presents information of all recognized objects at once, the over of information could be obtrusive to the user. As a solution for such a problem, attention-awareness can be integrated into a system. If a system knows to which object the user is attending, it can present only the information of relevant objects to the user. Towards attention-aware systems in everyday environments, this thesis presents approaches for analysis of user attention to visual content. Using a state-of-the-art wearable eye tracking device, one can measure the user's eye movements in a mobile scenario. By capturing the user's eye gaze position in a scene and analyzing the image where the eyes focus, a computer can recognize the visual content the user is currently attending to. I propose several image analysis methods to recognize the user-attended visual content in a scene image. For example, I present an application called Museum Guide 2.0. In Museum Guide 2.0, image-based object recognition and eye gaze analysis are combined together to recognize user-attended objects in a museum scenario. Similarly, optical character recognition (OCR), face recognition, and document image retrieval are also combined with eye gaze analysis to identify the user-attended visual content in respective scenarios. In addition to Museum Guide 2.0, I present other applications in which these combined frameworks are effectively used. The proposed applications show that the user can benefit from active information presentation which augments the attended content in a virtual environment with a see-through head-mounted display (HMD). In addition to the individual attention-aware applications mentioned above, this thesis presents a comprehensive framework that combines all recognition modules to recognize the user-attended visual content when various types of visual information resources such as text, objects, and human faces are present in one scene. In particular, two processing strategies are proposed. The First one selects an appropriate image analysis module according to the user's current cognitive state. The second one runs all image analysis modules simultaneously and merges the analytic results later. I compare these two processing strategies in terms of userattended visual content recognition when multiple visual information resources are present in the same scene. Furthermore, I present novel interaction methodologies for a see-through HMD using eye gaze input. A see-through HMD is a suitable device for a wearable attention-aware system for everyday environments because the user can also view his or her physical environment through the display. I propose methods for the user's attention engagement estimation with the display, eye gaze-driven proactive user assistance functions, and a method for interacting with a multi-focal see-through display. Contributions of this thesis include: • An overview of the state-of-the-art in attention-aware computer-human interaction and attention-integrated image analysis. • Methods for the analysis of user-attended visual content in various scenarios. • Demonstration of the feasibilities and the benefits of the proposed user-attended visual content analysis methods with practical user-supportive applications. • Methods for interaction with a see-through HMD using eye gaze. • A comprehensive framework for recognition of user-attended visual content in a complex scene where multiple visual information resources are present. This thesis opens a novel field of wearable computer systems where computers can understand the user attention in everyday environments and provide with what the user wants. I will show the potential of such wearable attention-aware systems for everyday environments for the next generation of pervasive computer-human interaction.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence