CNN-RNN and Data Augmentation Using Deep Convolutional Generative Adversarial Network for Environmental Sound Classification

Deep neural networks in deep learning have been widely demonstrated to have higher accuracy and distinct advantages over traditional machine learning methods in extracting data features. While convolutional neural networks (CNNs) have shown great success in feature extraction and audio classificatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2022, Vol.29, p.682-686
Hauptverfasser:	Bahmei, Behnaz, Birmingham, Elina, Arzanpour, Siamak
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial neural networks Background noise Classification Convolution Convolutional neural networks convolutional recurrent neural network (CRNN) Data augmentation Data models Datasets deep convolutional genera-tive adversarial networks Deep learning environmental sound classification Feature extraction Generative adversarial networks Generators Image classification Machine learning Model accuracy Neural networks Recurrent neural networks Spectrograms Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep neural networks in deep learning have been widely demonstrated to have higher accuracy and distinct advantages over traditional machine learning methods in extracting data features. While convolutional neural networks (CNNs) have shown great success in feature extraction and audio classification, it is important to note that real-time audios are dependent on previous scenes. Also, the main drawback of deep learning algorithms is that they need a huge number of datasets to indicate their efficient performance. In this paper, a recurrent neural network (RNN) combined with CNN is proposed to address this problem. Moreover, a Deep Convolutional Generative Adversarial Network (DCGAN) is used for high-quality data augmentation. This data augmentation technique is applied to the UrbanSound8K dataset to improve the environmental sound classification. Batch normalization, transfer learning, and three feature representations map are used to improve the model accuracy. The results show that the generated images by DCGAN have similar features to the original training images and has the capability to generate spectrograms and improve the classification accuracy. Experimental results on UrbanSound8K datasets demonstrate that the proposed CNN-RNN architecture achieves better performance than the state-of-the-art classification models.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2022.3150258