Skip to main content Skip to main navigation


Time-compressed spoken words enhance driving performance in complex visual scenarios : evidence of crossmodal semantic priming effects in basic cognitive experiments and applied driving simulator studies

Angela Castronovo
PhD-Thesis, Universität des Saarlandes, Saarländische Universitäts- und Landesbibliothek, Saarbrücken, 9/2014.


Would speech warnings be a good option to inform drivers about time-critical traffic situations? Even though spoken words take time until they can be understood, listening is well trained from the earliest age and happens quite automatically. Therefore, it is conceivable that spoken words could immediately preactivate semantically identical (but physically diverse) visual information, and thereby enhance respective processing. Interestingly, this implies a crossmodal semantic effect of auditory information on visual performance. In order to examine this rationale, eight experiments were conducted in this thesis. Starting with powerful interference paradigms from basic cognitive psychology, we approached more realistic scenarios step by step. In Experiments 1A and 1B, we employed a crossmodal Stroop color identification task with auditory color words as primes and visual color patches as targets. Responses were faster for congruent priming in comparison to neutral or incongruent priming. This effect emerged also for auditory primes which were time-compressed to 30 % or 10 % of their original length, and turned out to be even more pronounced under high-perceptual-load conditions. In order to rule out stimulus-response compatibilities as a cause of the congruency effects, we altered the task in Experiment 2: After brief target displays merely target-present or -absent decisions had to be made. Nevertheless, target detection (d') was increased by congruent primes in comparison to incongruent or neutral primes. These results suggest semantic object-based auditory-visual interactions which automatically increase the denoted target object's salience. Importantly, intentional or strategic listening was ruled out since the presentation of primes did not predict the identity of the subsequent targets: The conditional probability of a given prime being followed by a semantically matching target was only at chance level (no contingency). Nevertheless, crossmodal semantic priming effects were efficient and fast: They particularly occurred in complex visual scenes and at an SOA of only 100 ms. For all following experiments we exchanged colors as the relevant object feature by more specific automotive target icons and their two-syllable denotations. Using these materials and time-compression (to 50 % and 30 % of the original length) in Experiment 3A, we could replicate our earlier findings from Experiments 1A and 1B. Moreover, since warning systems likely present meaningful information, we slightly increased contingency from chance level (25 %) to 50 % in Experiment 3B. Overall, the pattern of benefits and costs remained similar to Experiment 3A. Interestingly, spoken word primes that were compressed down to 30 % of their original duration effectively led to faster reaction times when they were slightly predictive than when they were completely irrelevant (Exp. 3A vs. 3B). Accordingly, time compression improved responses already when listening became minimally useful. In another important step towards more realistic scenarios, we essentially transferred our approach to three experiments (4, 5A, and 5B) in a 3D driving simulator. Now gantry road signs showed task-relevant visual information, and, besides, we replaced explicit target identification by simple color classification (red vs. green) of the respective target. Thereby, primes and responses were orthogonal, and stimulus-response compatibility effects could again be eliminated. Pronounced benefits were revealed both in a simple button-press task on the steering wheel (Exp. 4), and also when participants executed complex driving maneuvers (Exps. 5A and 5B). When semantic contingency was increased considerably (to 80 %) in Experiment 5B, crossmodal facilitation and interference were both increased to a similar extent with the typical asymmetric pattern being retained. In summary, while employing different materials, paradigms, and dependent variables, we repeatedly found crossmodal benefits in visual task performance due to semantically congruent spoken words. These pronounced effects occurred even though words were presented only extremely shortly before object detection or classification, and, most importantly, even if listening to spoken words was not beneficial (or intended). Based on our findings, we assume that spoken words generally have the potential to rapidly preactivate visual information via semantic representations. Moreover, strategic listening effects seem to even add on top of this automatic process. These basic findings are highly encouraging for information transfer via speech warnings in time-critical on-road situations, since faster situation assessment and improved reactions could mitigate or even prevent accidents.


Weitere Links