Scene viewing and gaze analysis during phonetic segmentation tasks

Arif Khan, Ingmar Steiner, Ross Macdonald, Yusuke Sugano, Andreas Bulling

In: 18th European Conference on Eye Movements. European Conference on Eye Movements (ECEM) August 16-21 Vienna Austria 8/2015.


Phonetic segmentation is the process of splitting speech into individual sounds. Human experts perform this task manually by analyzing auditory and visual cues using analysis software, but one minute of speech can take an hour to segment. To improve automatic segmentation, which cannot yet match human experts’ accuracy, we analyzed the behavior of experts performing segmentation tasks, using a stationary eye-tracker. A 46s recording of ”The Northwind and the Sun” was segmented using standard phonetic software (Praat, Gaze activity was captured using a Tobii TX300. The computer screen and user interaction were recorded as well. Data collection is ongoing, and we plan to record 12 experts. During the task, experts zoom in to view short spans of audio; we analyzed the scene viewing behavior (fixation locations and durations), as well as the audio segments to which they listen. Moreover, activity over the entire task was analyzed within and across participants. Preliminary results provide new insight about experts’ behavior during phonetic segmentation tasks. Identifying critical features of visible speech in this manner will allow us to model their importance for automatic segmentation. It also exposes behavioral differences across individual experts performing the same task.

ECEMPoster.pdf (pdf, 10 MB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence