Skip to main content Skip to main navigation


Appearance-Based Gaze Estimation with Deep Neural Networks: From Data Collection to Evaluation

Ankur Bhatt; Ko Watanabe; Andreas Dengel; Shoya Ishimaru
In: In Proceedings of the 5th International Conference on Activity and Behavior Computing (ABC '23), 2023. International Conference on Activity and Behavior Computing (ABC-2023), September 7-9, Kaiserslautern, Germany, Springer, 2023.


Gaze estimation is an important factor in human activity and behavior recognition. The technology is used in numerous applications such as human-computer interaction, driver monitoring systems, and surveillance. Gaze estimation can be achieved using different technologies such as wearable devices or cameras. Estimating gaze using a webcam can indeed be more accessible and convenient compared to methods that rely on specific hardware like infrared cameras. In this paper, we propose a data acquisition approach for modeling appearance-based webcam gaze estimation. We implement an application to capture gaze points using a common webcam. The application asks to click on the circle displayed on the screen, and whenever the circle is clicked, the face image and the pixel coordinate of the center of the circle are stored. From each of the 17 participants, 50 patterns of face images and pixel coordinate information were collected. The gaze estimation models used were VGG16, ResNet50, EfficientNetB7, and EfficientNetB2. In conclusion, the result of the test set is best for VGG16 (four feature extractors) with an error difference of 2.4 cm. To validate our model, we also applied a leave-one-participant-out cross-validation and found that the participant with the smallest error difference is 2.533 cm and the largest error difference is 4.759 cm. The study contributes to proposing the data collection method, the best prediction model, and discovering the difficulty of prediction occurs with human individual differences for webcam-based gaze estimation.