Parallel Recurrent Convolutional Neural Networks-Based Music Genre Classification Method for Mobile Devices

With the rapid development of the mobile internet of things (IoTs) and mobile sensing devices, a large amount of mobile computing-oriented applications have attracted attention both from industry and academia. Deep learning based methods have achieved great success in artificial intelligence (AI) or...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.19629-19637
Hauptverfasser: Yang, Rui, Feng, Lin, Wang, Huibing, Yao, Jianing, Luo, Sen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid development of the mobile internet of things (IoTs) and mobile sensing devices, a large amount of mobile computing-oriented applications have attracted attention both from industry and academia. Deep learning based methods have achieved great success in artificial intelligence (AI) oriented applications. To advance the development of AI-based IoT systems, effective and efficient algorithms are in urgent need for IoT Edge Computing. Time-series data classification is an ongoing problem in applications for mobile devices (e.g. music genre classification on mobile phones). However, the traditional methods require field expertise to extract handcrafted features from the time-series data. Deep learning has been demonstrated to be effective and efficient in this kind of data. Nevertheless, the existing works neglect some of the sequential relationships found in the time-series data, which are significant for time-series data classification. Considering the aforementioned limitations, we propose a hybrid architecture, named the parallel recurrent convolutional neural network (PRCNN). The PRCNN is an end-to-end training network that combines feature extraction and time-series data classification in one stage. The parallel CNN and Bi-RNN blocks focus on extracting the spatial features and temporal frame orders, respectively, and the outputs of two blocks are fused into one powerful representation of the time-series data. Then, the syncretic vector is fed into the softmax function for classification. The parallel network structure guarantees that the extracted features are robust enough to represent the time-series data. Moreover, the experimental results demonstrate that our proposed architecture outperforms the previous approaches applied to the same datasets. We also take the music data as an example to conduct contrastive experiments to verify that our additional parallel Bi-RNN block can improve the performance of time-series classification compared with utilizing CNNs alone.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2968170