Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-11
Hauptverfasser:	Bae, Sangmin, June-Woo, Kim, Won-Yang, Cho, Baek, Hyerim, Son, Soyoun, Lee, Byungjo, Ha, Changwan, Kyongpil Tae, Kim, Sungnyun, Se-Young, Yun
Format:	Artikel
Sprache:	eng
Schlagworte:	Audio data Classification Computer Science - Learning Computer Science - Sound Datasets Deep learning Health services Sound Stethoscopes Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2305.14032