BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful comp...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Respiratory sound classification (RSC) is challenging due to varied acoustic
signatures, primarily influenced by patient demographics and recording
environments. To address this issue, we introduce a text-audio multimodal model
that utilizes metadata of respiratory sounds, which provides useful
complementary information for RSC. Specifically, we fine-tune a pretrained
text-audio multimodal model using free-text descriptions derived from the sound
samples' metadata which includes the gender and age of patients, type of
recording devices, and recording location on the patient's body. Our method
achieves state-of-the-art performance on the ICBHI dataset, surpassing the
previous best result by a notable margin of 1.17%. This result validates the
effectiveness of leveraging metadata and respiratory sound samples in enhancing
RSC performance. Additionally, we investigate the model performance in the case
where metadata is partially unavailable, which may occur in real-world clinical
setting. |
---|---|
DOI: | 10.48550/arxiv.2406.06786 |