RNN-based signal classification for hybrid audio data compression

Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection meth...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computing 2020-03, Vol.102 (3), p.813-827
Hauptverfasser:	Tu, Weiping, Yang, Yuhong, Du, Bo, Yang, Wanzhao, Zhang, Xiong, Zheng, Jiaxi
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Audio data Audio signals Codec Coding Complexity Computer Appl. in Administrative Data Processing Computer Communication Networks Computer Science Computer Science, Theory & Methods Data compression Data management Information Systems Applications (incl.Internet) Modal choice Multimedia Music Quality Recurrent neural networks Science & Technology Sequences Signal classification Software Engineering Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection methods are adopted in AMR-WB+, the state-of-the-art switched audio coder. The closed-loop method obtains good quality, but it has a high computation complexity. Conversely, the open-loop method reduces complexity but has unsatisfactory coding quality. Therefore, in this study, a speech/music discrimination based on a recurrent neural network (RNN) model is investigated to improve the coding performance of AMR-WB+. An RNN model is chosen for its outstanding performance on processing time series. The recurrent structure of RNN makes it capable of learning and making full use of the temporal information of the input sequences to make up for the deficiencies of the short-term features. We quantitatively analyze the quality loss caused by two types of misclassification and the tune parameter of the classifier to improve the signal-to-noise ratio (SNR) of the synthesized signals. The experimental results show that the proposed method increases the accuracy of the mode selection with a rate of 18% and the coding quality of 0.21 dB in segmental SNR in comparison with the open-loop method. Moreover, it reduces the computational complexity by about 43% in comparison with the closed-loop method in AMR-WB+.
ISSN:	0010-485X 1436-5057
DOI:	10.1007/s00607-019-00713-8