Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data
Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background: This study aimed to evaluate and compare the performance of
classical machine learning models (CMLs) and large language models (LLMs) in
predicting mortality associated with COVID-19 by utilizing a high-dimensional
tabular dataset.
Materials and Methods: We analyzed data from 9,134 COVID-19 patients
collected across four hospitals. Seven CML models, including XGBoost and random
forest (RF), were trained and evaluated. The structured data was converted into
text for zero-shot classification by eight LLMs, including GPT-4 and
Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to
enhance its predictive capabilities.
Results: Among the CML models, XGBoost and RF achieved the highest accuracy,
with F1 scores of 0.87 for internal validation and 0.83 for external
validation. In the LLM category, GPT-4 was the top performer with an F1 score
of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to
79%, resulting in an F1 score of 0.74, which was stable during external
validation.
Conclusion: While LLMs show moderate performance in zero-shot classification,
fine-tuning can significantly enhance their effectiveness, potentially aligning
them closer to CML models. However, CMLs still outperform LLMs in
high-dimensional tabular data tasks. |
---|---|
DOI: | 10.48550/arxiv.2409.02136 |