Project | Emonymous

Duration: 08/01/2021 - 07/31/2023

Emonymous

Research Topics

Language & Text Understanding

Application fields

Interactive intelligent speech technologies are conquering our homes. In the Emonymous project, we are pursuing the goal of completely anonymizing a speaker's identity without losing emotional and speech content information. From the point of view of data protection, this exploitation of speech data also offers enormous application potential.

The SLT contributes significant expertise in the areas of:

speech synthesis, for example voice conversion (VC), speech-to-text (STT), voice cloning, zero-shot learning.
Speech Recognition, e.g. Automatic Speech Recognition (ASR), Multi-Lingual Speech Recognition.
Speaker Recognition, e.g. Automatic Speaker Recognition and Verification (ASV), Multi-Lingual Speaker Recognition
Emotion Recognition from Speech, Text, Video/Images, Multimodal*, e.g. Transformer-based Models, Acoustic- , Linguistic- (Language Models), and Visual Models (Facial Expression, Landmarks)*.
Crowd-based AI support, *e.g. automated online orchestrated crowd- and expert sourcing hybrid AI+Human workflows for high quality data acquisition.
AI in the area of pre-trained language models, transfer-learning, cross-lingual learning, continuous learning, frugal AI.

Focus: Due to ever advancing AI, interactive and intelligent voice assistants are conquering more and more everyday life. However, privacy concerns prevent them from being used beyond the home. In particular, the identification of a speaker's identity through voice due to the large amount of data collected prevents an effective use of these technologies in sensitive task areas (health sector, learning support). For many applications, however, it is only necessary to know what was said and not who said it. Here, anonymizing the speaker id can prevent identification in (cloud-based) further processing. However, speech, based on how something was said, conveys further indicators (e.g. emotions, personality, proficiency) which are necessary to be able to react adequately to the individual needs of the user and thus improve the interaction.

The goal of this joint project is to completely anonymize the speaker identity while preserving the emotional and speech content-related information as far as possible. For this purpose, we rely on the latest AI developments with Voice Conversion or Differential Digital Signal Processing.

In combination with a newly developed differentiable similarity measure, it is possible to derive indicators for the success of anonymization. The developed techniques allow to advance diverse innovative applications while preserving speaker anonymity and strengthen applications of science as well as Germany as a business location.

Lead: Dr. Tim Polzehl Dr. Tim Polzehl leads the AI-based developments in the area of speech-based applications of the Speech and Language Technology department at DFKI.In addition, he leads the area of "Next Generation Crowdsourcing and Open Data" and is an active member of the "Speech Technolgy" group of the Quality and Usability Labs (QU-Labs) at the Technical University of Berlin.

Profile DFKI: https://www-live.dfki.de/web/ueber-uns/mitarbeiter/person/tipo02

Profile QU-Labs TU-Berlin: https://www.tu.berlin/index.php?id=29499/

Contact:tim.polzehl@dfki.de

Partners

Technische Universität Berlin, Quality and Usability Lab Otto-von-Guericke-Universität Magdeburg, Fachgebiet Mobile Dialogsysteme

Contact Person

Dr.-Ing. Tim Polzehl

Tim.Polzehl@dfki.de
Phone: +49 30 23895 1863

Keyfacts

Involved research areas

Speech and Language Technology

Head

Dr.-Ing. Tim Polzehl

Publications

All publications

Fighting Disinformation: Overview of Recent AI-Based Collaborative Human-Computer Interaction for Intelligent Decision Support Systems
Tim Polzehl; Vera Schmitt; Nils Feldhus; Joachim Meyer; Sebastian Möller
In: Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP,. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP-2022), Pages 267-278, ISBN 978-989-758-634-7, SciTePress, 2023.
RCT-Net: TDNN based Speaker Verification with 2D Res2Nets on Frame Level Feature Extractor
Razieh Khamsehashari; Fengying Miao; Tim Polzehl; Sebastian Möller
In: The Eighth International Conference on Advances in Signal, Image and Video Processing - SIGNAL 2023. International Conference on Advances in Signal, Image and Video Processing (SIGNAL-2023), March 13-17, Barcelona, Spain, ISBN 978-1-68558-057-5, IARIA, 2023.
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh; Arnab Das; Yamini Sinha; Ingo Siegert; Tim Polzehl; Sebastian Stober
In: Proc. INTERSPEECH 2023. Conference in the Annual Series of Interspeech Events (INTERSPEECH-2023), Pages 2093-2097, ISCA-speech, 2023.

Project | Emonymous

Emonymous

Research Topics

Application fields

Partners

Contact Person

Keyfacts

Involved research areas

Head

Publications

Fighting Disinformation: Overview of Recent AI-Based Collaborative Human-Computer Interaction for Intelligent Decision Support Systems

RCT-Net: TDNN based Speaker Verification with 2D Res2Nets on Frame Level Feature Extractor

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

Funding Authorities

BMBF - Federal Ministry of Education and Research

Research Topics

Application fields

Partners

Share project:

Contact Person

Keyfacts

Involved research areas

Head

Fighting Disinformation: Overview of Recent AI-Based Collaborative Human-Computer Interaction for Intelligent Decision Support Systems

RCT-Net: TDNN based Speaker Verification with 2D Res2Nets on Frame Level Feature Extractor

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

Funding Authorities

BMBF - Federal Ministry of Education and Research