• Duration:

Interactive intelligent speech technologies are conquering our homes. In the Emonymous project, we are pursuing the goal of completely anonymizing a speaker's identity without losing emotional and speech content information. From the point of view of data protection, this exploitation of speech data also offers enormous application potential.

The SLT contributes significant expertise in the areas of:

  • speech synthesis, for example voice conversion (VC), speech-to-text (STT), diffenrential digital signal processing (DDSP).
  • Speech Recognition*, e.g. Automatic Speech Recognition (ARS), Multi-Lingual Speech Recognition*.
  • Emotion Recognition from Speech, Text, Video/Images, Multimodal*, e.g. Transformer-based Models, Acoustic- , Linguistic- (Language Models), and Visual Models (Facial Expression, Landmarks)*.
  • Crowd-based AI support, *e.g. automated online orchestrated crowd- and expert sourcing hybrid AI+Human workflows for high quality data acquisition.
  • AI in the area of pre-trained language models, transfer-learning, cross-lingual learning, continuous learning, frugal AI.

Focus: Due to ever advancing AI, interactive and intelligent voice assistants are conquering more and more everyday life. However, privacy concerns prevent them from being used beyond the home. In particular, the identification of a speaker's identity through voice due to the large amount of data collected prevents an effective use of these technologies in sensitive task areas (health sector, learning support). For many applications, however, it is only necessary to know what was said and not who said it. Here, anonymizing the speaker id can prevent identification in (cloud-based) further processing. However, speech, based on how something was said, conveys further indicators (e.g. emotions, personality, proficiency) which are necessary to be able to react adequately to the individual needs of the user and thus improve the interaction.

The goal of this joint project is to completely anonymize the speaker identity while preserving the emotional and speech content-related information as far as possible. For this purpose, we rely on the latest AI developments with Voice Conversion or Differential Digital Signal Processing.

In combination with a newly developed differentiable similarity measure, it is possible to derive indicators for the success of anonymization. The developed techniques allow to advance diverse innovative applications while preserving speaker anonymity and strengthen applications of science as well as Germany as a business location.

Lead: Dr. Tim Polzehl Dr. Tim Polzehl leads the AI-based developments in the area of speech-based applications of the Speech and Language Technology department at DFKI.In addition, he leads the area of "Next Generation Crowdsourcing and Open Data" and is an active member of the "Speech Technolgy" group of the Quality and Usability Labs (QU-Labs) at the Technical University of Berlin.

Profile DFKI:

Profile QU-Labs TU-Berlin:


Technische Universität Berlin, Quality and Usability Lab Otto-von-Guericke-Universität Magdeburg, Fachgebiet Mobile Dialogsysteme

Share project:

Contact Person
Prof. Dr.-Ing. Sebastian Möller
Prof. Dr.-Ing. Sebastian Möller

Publications about the project

Neslihan Iskender, Tim Polzehl, Sebastian Möller

In: Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing. Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP-2021) located at EACL 2021 April 20-20 online Pages 1-7 ISBN 978-1-954085-17-6 Association for Computational Linguistics 4/2021.

To the publication
Neslihan Iskender, Tim Polzehl, Sebastian Möller

In: Proceedings of the Workshop on Human Evaluation of NLP Systems. Workshop on Human Evaluation of NLP Systems (HumEval-2021) located at EACL 2021 April 19-19 Kyiv Ukraine Pages 86-96 ISBN 978-1-954085-10-7 Association for Computational Linguistics 2021.

To the publication
Yuexin Cao, Vicente Ivan Sanchez Carmona, Xiaoyi Liu, Changjian Hu, Neslihan Iskender, André Beyer, Tim Polzehl, Sebastian Möller

In: HUCAPP 2022 - 6th International Conference on Human Computer Interaction Theory and Applications. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP-2021) Springer 2021.

To the publication

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz