Skip to main content Skip to main navigation

Publikation

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

Yassine El Kheir; Tim Polzehl; Sebastian Moller (Hrsg.)
Conference in the Annual Series of Interspeech Events (INTERSPEECH-2025), INTERSPEECH 2025, located at INTERSPEECH-2025, August 16-22, Netherlands, Netherlands, ISCA Arxiv, 9/2025.

Zusammenfassung

We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectrotemporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMambaST effectively captures the subtle cues of synthetic speech. In addition, our proposed framework leverages a convolution-based 2D attention map to focus on specific spectro-temporal regions, enabling robust deepfake detection. Operating directly on raw features, BiCrossMamba-ST achieves significant performance improvements, a 67.74% and 26.3% relative gain over state-of-the-art AASIST on ASVSpoof LA21 and ASVSpoof DF21 benchmarks, respectively, and a 6.80% improvement over RawBMamba on ASVSpoof DF21. Code and models will be made publicly available.

Projekte

Weitere Links