A Fast-Match Approach for Robust, faster than Real-Time Speaker Diarization

Y Huang, O. Vinyals, G. Friedland, Christian Müller, N. Mirghafori, C. Wooters

In: Proceedings of the tenth biannual IEEE workshop on Automatic Speech Recognition and Understanding. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU-2007) December 9-13 Kyoto Japan 2007.


During the past few years, speaker diarization has achieved satisfying accuracy in terms of speaker Diarization Error Rate (DER). The most successful approaches, based on agglomerative clustering, however, exhibit an inherent computational complexity which makes real-time processing, especially in combination with further processing steps, almost impossible. In this article we present a framework to speed up agglomerative clustering speaker diarization. The basic idea is to adopt a computationally cheap method to reduce the hypothesis space of the more expensive and accurate model selection via Bayesian Information Criterion (BIC). Two strategies based on the pitch-correlogram and the unscented-transform based approximation of KL-divergence are used independently as a fast-match approach to select the most likely clusters to merge. We performed the experiments using the existing ICSI speaker diarization system. The new system using KL-divergence fast-match strategy only performs 14% of total BIC comparisons needed in the baseline system, speeds up the system by 41% without affecting the speaker Diarization Error Rate (DER). The result is a robust and faster than real-time speaker diarization system.

huang_et_al_2007.pdf (pdf, 923 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence