The Effect of Noise on Deep Learning for Classification of Pathological Voice

Objective This study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders. Methods A dataset of 1406 voice samples was collected from retrospective data, and a 5‐layer 1D convolutional neural network (CNN) model was construct...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Laryngoscope 2024-08, Vol.134 (8), p.3537-3541
Hauptverfasser: Hasebe, Koki, Fujimura, Shintaro, Kojima, Tsuyoshi, Tamura, Keiichi, Kawai, Yoshitaka, Kishimoto, Yo, Omori, Koichi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Objective This study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders. Methods A dataset of 1406 voice samples was collected from retrospective data, and a 5‐layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score. Results The model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components. Conclusion The model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise‐tolerant techniques, such as data augmentation, to improve the model's noise resilience in real‐world settings. Level of Evidence This study evaluates a machine learning model using a single dataset without comparative controls. Given its non‐comparative design and specific focus, it aligns with Level 4 evidence (Case‐series) under the 2011 OCEBM guidelines Laryngoscope, 134:3537–3541, 2024
ISSN:0023-852X
1531-4995
1531-4995
DOI:10.1002/lary.31303