Deep Generative Replay With Denoising Diffusion Probabilistic Models for Continual Learning in Audio Classification
Accurate classification of audio data is essential in various fields such as speech recognition, safety management, healthcare, security, and surveillance. However, existing deep learning classifiers typically require extensive pre-collected data and struggle to adapt to the emergence of new audio c...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.134714-134727 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accurate classification of audio data is essential in various fields such as speech recognition, safety management, healthcare, security, and surveillance. However, existing deep learning classifiers typically require extensive pre-collected data and struggle to adapt to the emergence of new audio classes over time. To address these challenges, this paper proposes a continual learning method utilizing Diffusion-driven Generative Replay (DDGR). The proposed DDGR method continuously updates the model at each training stage with high-quality generated data from Denoising Diffusion Probabilistic Models (DDPM), preserving existing knowledge. Furthermore, by embedding disentangled representations through a triplet network, the model can effectively recognize new classes as they emerge. This approach overcomes the problem of catastrophic forgetting and effectively resolves the issue of data scalability in a continual learning setup. The proposed method achieved the highest AIA values of 95.45% and 72.99% on the Audio MNIST and ESC-50 datasets, respectively, compared to existing continual learning methods. Additionally, for Audio MNIST, it showed IM −0.01, FWT 0.27, FM 0.06, and BWT −0.06, indicating that it best preserves prior knowledge while learning new data most effectively. For ESC-50, it demonstrated IM of −0.12, FWT of 0.09, FM of 0.17, and BWT of −0.17. These results validate the efficacy of the DDGR method in maintaining prior knowledge while integrating new information and highlight the complementary role of the triplet network in enhancing feature representation. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3459954 |