The ICSI 2007 Language Recognition System

Christian Müller, Joan-Isaac Biel

In: Proceeedings of the Odyssey 2008 Workshop on Speaker and Language Recognition. Odyssey Workshop on Speaker and Language Recognition (Odyssey-2008) January 21-24 Stellenbosch South Africa ISCA Archive 2008.


In this paper, we describe the ICSI 2007 language recognition system. The core phonotactic part of the system constitutes a variant of the classical PPRLM (parallel phone recognizer followed by language modeling) approach. However, the phone recognizers are replaced by "phone-like" acoustic subword unit recognizers (SWR) that are trained in an unsupervised fashion without making use of any phonetically labeled data. Analogously, the backend language modeling is substituted by SVMs, as a more powerful, discriminative classification method. Besides sub-word unit n-grams, the SVM feature vector includes vector quantized MFCC n-grams, representing the short-term cepstral component of the system. A pair of prosodic frontends is introduced as well by augmenting the standard sub-word units with binned pitch and energy values, respectively. Rank normalization is described as a normalization method superior to mean-variance normalization for this particular task. Preliminary results obtained on the LRE 2003 evaluation data set suggest that the multiplicity of frontends can be effectively combined. The SWR frontend augmented with pitch is performing best, followed by the standard SWR frontend. The results are discussed and objective for (near) future work are described.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence