Hybrid feature engineering of medical data via variational autoencoders with triplet loss: a COVID-19 prognosis study

Medical machine learning frameworks have received much attention in recent years. The recent COVID-19 pandemic was also accompanied by a surge in proposed machine learning algorithms for tasks such as diagnosis and mortality prognosis. Machine learning frameworks can be helpful medical assistants by...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2023-02, Vol.13 (1), p.2827-2827, Article 2827
Hauptverfasser: Mahdavi, Mahdi, Choubdar, Hadi, Rostami, Zahra, Niroomand, Behnaz, Levine, Alexandra T., Fatemi, Alireza, Bolhasani, Ehsan, Vahabie, Abdol-Hossein, Lomber, Stephen G., Merrikhi, Yaser
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Medical machine learning frameworks have received much attention in recent years. The recent COVID-19 pandemic was also accompanied by a surge in proposed machine learning algorithms for tasks such as diagnosis and mortality prognosis. Machine learning frameworks can be helpful medical assistants by extracting data patterns that are otherwise hard to detect by humans. Efficient feature engineering and dimensionality reduction are major challenges in most medical machine learning frameworks. Autoencoders are novel unsupervised tools that can perform data-driven dimensionality reduction with minimum prior assumptions. This study, in a novel approach, investigated the predictive power of latent representations obtained from a hybrid autoencoder (HAE) framework combining variational autoencoder (VAE) characteristics with mean squared error (MSE) and triplet loss for forecasting COVID-19 patients with high mortality risk in a retrospective framework. Electronic laboratory and clinical data of 1474 patients were used in the study. Logistic regression with elastic net regularization (EN) and random forest (RF) models were used as final classifiers. Moreover, we also investigated the contribution of utilized features towards latent representations via mutual information analysis. HAE Latent representations model achieved decent performance with an area under ROC curve of 0.921 (±0.027) and 0.910 (±0.036) with EN and RF predictors, respectively, over the hold-out data in comparison with the raw (AUC EN: 0.913 (±0.022); RF: 0.903 (±0.020)) models. The study aims to provide an interpretable feature engineering framework for the medical environment with the potential to integrate imaging data for efficient feature engineering in rapid triage and other clinical predictive models.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-023-29334-0