Skip to main content Skip to main navigation


Multimodal Driver Interaction using Gesture, Gaze and Speech

Abdul Rafey Aftab
In: ICMI '19: 2019 International Conference on Multimodal Interaction. ACM International Conference on Multimodal Interaction (ICMI-2019), October 14-18, Suyhou, China, Pages 487-492, ISBN 78-1-4503-6860-5/19/10, ACM, 2019.


The ever-growing research in computer vision has created new avenues for user interaction. Speech commands and gesture recognition are already being applied in various touch-based inputs. It is, therefore, foreseeable, that the use of multimodal input methods for user interaction is the next phase in development. In this paper, I propose a research plan of novel methods for the use of multimodal inputs for the semantic interpretation of human-computer interaction, specifically applied to a car driver. A fusion methodology has to be designed that adequately makes use of a recognized gesture (specifically finger pointing), eye gaze and head pose for the identification of reference objects, while using the semantics from speech for a natural interactive environment for the driver. The proposed plan includes different techniques based on artificial neural networks for the fusion of the camera-based modalities (gaze, head and gesture). It then combines features extracted from speech with the fusion algorithm to determine the intent of the driver.

Weitere Links