Clinical application potential of large language model: a study based on thyroid nodules

Background Limited data indicated the performance of large language model (LLM) taking on the role of doctors. We aimed to investigate the potential for ChatGPT-3.5 and New Bing Chat acting as doctors using thyroid nodules as an example. Methods A total of 145 patients with thyroid nodules were incl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Endocrine 2025, Vol.87 (1), p.206-213
Hauptverfasser: Xia, Shujun, Hua, Qing, Mei, Zihan, Xu, Wenwen, Lai, Limei, Wei, Minyan, Qin, Yu, Luo, Lin, Wang, Changhua, Huo, ShengNan, Fu, Lijun, Zhou, Feidu, Wu, Jiang, Zhang, Li, Lv, De, Li, Jianxin, Wang, Xin, Li, Ning, Song, Yanyan, Zhou, Jianqiao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background Limited data indicated the performance of large language model (LLM) taking on the role of doctors. We aimed to investigate the potential for ChatGPT-3.5 and New Bing Chat acting as doctors using thyroid nodules as an example. Methods A total of 145 patients with thyroid nodules were included for generating questions. Each question was entered into chatbot of ChatGPT-3.5 and New Bing Chat five times and five responses were acquired respectively. These responses were compared with answers given by five junior doctors. Responses from five senior doctors were regarded as gold standard. Accuracy and reproducibility of responses from ChatGPT-3.5 and New Bing Chat were evaluated. Results The accuracy of ChatGPT-3.5 and New Bing Chat in answering Q2, Q3, Q5 were lower than that of junior doctors (all P 
ISSN:1559-0100
1559-0100
DOI:10.1007/s12020-024-03981-3