Visual Search Target Inference in Natural Interaction Settings with Machine Learning

Michael Barz, Sven Stauden, Daniel Sonntag

In: Andreas Bulling , Anke Huckauf , Eakta Jain , Ralph Radach , Daniel Weiskopf (Hrsg.). ACM Symposium on Eye Tracking Research and Applications. Symposium on Eye Tracking Research & Applications (ETRA-2020) Stuttgart Germany Association for Computing Machinery New York, NY, USA 5/2020.


Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence