Speech-overlapped Acoustic Event Detection for Automotive Applications

Christian Müller; Joan-Isaac Biel; Edward Kim; Daniel Rosario

In: Proceedings of the Interspeech 2008. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2008), September 22-26, Brisbane, Australia, 2008.


We present two approaches on acoustic event detection for speech-enabled car applications: a generative GMM-UBM approach and a discriminative GMM-SVM supervector approach. The systems detect whether or not a certain acoustic event occurred while the built-in microphone of the car was active to record a spoken command, either before, while, or after the driver was speaking. These events can be music playing, phone ringing, a passenger different from the driver is talking, laughing, or coughing. The task is formally defined as a detection task along the lines of well established detection tasks such as speaker recognition or language recognition. Similarly, the evaluation procedure has been designed to resemble the respective official evaluation series performed by NIST (i.e. it was a blind - one-shot - evaluation on a separately provided dataset). The performance of the system was calculated in terms of detection miss and false alarm probabilities (CMiss = CFA = 1, and PTarget = 0.5). The performance of the superior GMMSVM system was 0.0345 for known test speakers and 0.1955 for novel test speakers. Frequency-filtered band energy coefficients (FFBE) outperformed MFCCS on that task. The results are promising and suggest further experiments on more data.

IS080663.PDF (PDF, 250 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence