Skip to main content Skip to main navigation

Publication

Non-intrusive Estimation of Packet Loss Rates in Speech Communication Systems Using Convolutional Neural Networks

Gabriel Mittag; Sebastian Möller
In: 2018 IEEE International Symposium on Multimedia (ISM). IEEE International Symposium on Multimedia (ISM-2018), December 10-12, Taichung, Taiwan, Province of China, Pages 105-109, ISBN 978-1-5386-6857-3, IEEE, 2018.

Abstract

In this paper, we analyze whether deep convolutional neural networks can be used to detect lost packets in speech communication systems. The speech quality of modern communication networks has significantly improved recently, for example through higher available audio bandwidth. This was, among other reasons, possible through the use of packet-based networks, which allow a fully digital transmission from the sender to the receiver terminal. However, these networks often suffer from frequent interruptions caused by lost packets due to transmission errors. Consequently, the packet loss rate is one of the main indicators for the quality of speech communication services. In spite of that, the information of how many packets are lost in a network is not always available. To estimate the amount of lost packets, we calculate spectrograms of the transmitted speech signals and use them as input of a convolutional neural network. This approach has recently gained popularity in the field of detection and recognition tasks for music and speech. The interruptions caused by lost packets can often clearly be seen in the spectrogram of the degraded signal. Therefore, it seems natural to interpret the spectrograms as images and use deep learning methods that are common for image classification. The proposed model allows for estimating the packet loss rate of a communication system by simply using the recorded speech file from the receiver side, without the need of the reference speech signal that was originally sent through the channel. Our results show that the model reduces the prediction error by more than 75% when compared to a model that is based on MFCC features.