Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification

Classification models using deep or machine learning algorithms require a sufficient and balanced training dataset to improve performance. Still, they suffer from data collection due to data privacy issues. In medical research, where most data variables are sensitive information, collecting enough t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.127782-127791
Hauptverfasser: Hun Yoo, Joo, Hyun An, Ji, Chung, Tai-Myoung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Classification models using deep or machine learning algorithms require a sufficient and balanced training dataset to improve performance. Still, they suffer from data collection due to data privacy issues. In medical research, where most data variables are sensitive information, collecting enough training data for model performance improvement is more challenging. This study presents a new medical data augmentation algorithm consisting of four steps to solve the data shortage and class imbalance issues. The main idea of the proposed algorithm is to reflect the core characteristic of the original data's class label. The algorithm receives an original dataset as an input value to extract the feature vector and trains the individual autoencoder model. Then it verifies the augmented feature vector through a distributional equality check, and each feature vector is concatenated into one feature vector. The deep learning model inference is applied on a concatenated vector for the second verification, to finalize the augmented training dataset. Our team performed mood disorder classification using patient data to prove the presented data augmentation algorithm. With the method, the classification performance improved by 0.059 in the severity classification of major depressive disorder, 0.041 in the severity classification of anxiety disorder, and 0.073 in the subtype classification of bipolar disorder. Through this study, we proved that our algorithm can be applied to minimize model bias and improve classification performance on the medical data that are unbalanced or insufficient in number by class.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3396138