RETRACTED ARTICLE: Audio fingerprint analysis for speech processing using deep learning method

We are generating truly mind-boggling amounts of audio data on a daily basis simply by using the Internet. In different audio-based applications, it increases the complexity of accessing and analyzing audio data. Therefore, the framework or supporting tools needed to retrieve audio data to make inte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2022, Vol.25 (3), p.575-581
1. Verfasser: Altalbe, Ali
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We are generating truly mind-boggling amounts of audio data on a daily basis simply by using the Internet. In different audio-based applications, it increases the complexity of accessing and analyzing audio data. Therefore, the framework or supporting tools needed to retrieve audio data to make intelligent decisions in speech processing. However, non-stationarity and irregularity are insufficient for segmentation and classification of audio signals. Audio classification methods are used in many applications, such as speaker identification, gender recognition, music type classification, natural sound classification, etc. This work proposes a deep learning method based on long-term short-term memory (LSTM) that can be used with preprocessing, segmentation, and retrieval of audio signals from the GTZAN dataset. The simulation results show that the proposed algorithm can effectively improve the audio fingerprint-based data retrieval accuracy and overcome traditional methods' drawbacks. Compared with existing methods, the proposed LSTM method has achieved good results. The precision, recall, accuracy and F-measure of LSTM is 96.54%, 96.15%, 98.56% and 0.96% respectively. In the real world, the recommended audio fingerprint recognition system effectively works through voice applications, especially in heterogeneous portable consumer devices or online audio distributed systems.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-021-09827-x