Skip to main content Skip to main navigation


Visual Search Target Inference Using Bag of Deep Visual Words

Sven Stauden; Michael Barz; Daniel Sonntag
In: Frank Trollmann; Anni-Yasmin Turhan (Hrsg.). KI 2018: Advances in Artificial Intelligence - 41st German Conference on AI. German Conference on Artificial Intelligence (KI-2018), September 24-28, Berlin, Germany, Springer, 8/2018.


Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.