Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications

Florian Metze, Jitendra Ajmera, Roman Englert, Udo Bub, Felix Burkhardt, Joachim Stegmann, Christian Müller, Richard Huber, Bernt Andrassy, Josef G. Bauer, Bernhard Littel

In: Proceedings of the 32nd International Conference on Acoustics, Speech, and Signal Processing. International Conference on Acoustics, Speech and Signal Processing (ICASSP-2007) April 15-20 Honolulu Hawaii United States 2007.


This paper presents a comparative study of four different approaches to automatic age and gender classification using seven classes on a telephony speech task and also compares the results with Human performance on the same data. The automatic approaches compared are based on (1) a parallel phone recognizer, derived from an automatic language identification system; (2) a system using dynamic Bayesian networks to combine several prosodic features; (3) a system based solely on linear prediction analysis; and (4) Gaussian mixture models based on MFCCs for separate recognition of age and gender. On average, the parallel phone recognizer performs as well as Human listeners do, while loosing performance on short utterances. The system based on prosodic features however shows very little dependence on the length of the utterance.

metzeEtAl2007.pdf (pdf, 240 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz