Publikation

Generalizable Audio Spoofing Detection using Non-Semantic Representations

Arnab Das; Yassine El Kheir; Carlos Franzreb; Tim Herzig; Tim Polzehl; Sebastian Moller (Hrsg.)

Conference in the Annual Series of Interspeech Events (INTERSPEECH-2025), August 17-21, Rotterdam, Netherlands, ISCA - Interspeech 2025, 8/2025.

Zusammenfassung

Rapid advancements in generative modeling have made synthetic audio generation easy, making speech-based services vulnerable to spoofing attacks. Consequently, there is a dire need for robust countermeasures more than ever. Existing solutions for deepfake detection are often criticized for lacking generalizability and fail drastically when applied to real-world data. This study proposes a novel method for generalizable spoofing detection leveraging non-semantic universal audio representations. Extensive experiments have been performed to find suitable non-semantic features using TRILL and TRILLsson models. The results indicate that the proposed method achieves comparable performance on the in-domain test set while significantly outperforming state-of-the-art approaches on out-of-domain test sets. Notably, it demonstrates superior generalization on public-domain data, surpassing methods based on hand-crafted features, semantic embeddings, and end-to-end architectures.

Projekte

news-polygraph - Privatsphäre, Transparenz, Verzerrung und Fairness für vertrauenswürdige multimodale Desinformationserkennung

Weitere Links

https://www.isca-archive.org/interspeech_2025/das25_interspeech.pdf

TrillFake_IS2025_(3).pdf (pdf, 223 KB )