Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists

Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Psychiatry and clinical neurosciences 2024-06, Vol.78 (6), p.347-352
Hauptverfasser: Li, Dian‐Jeng, Kao, Yu‐Chen, Tsai, Shih‐Jen, Bai, Ya‐Mei, Yeh, Ta‐Chuan, Chu, Che‐Sheng, Hsu, Chih‐Wei, Cheng, Szu‐Wei, Hsu, Tien‐Wei, Liang, Chih‐Sung, Su, Kuan‐Pin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result Only GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology & Epidemiology’ (χ2 = 22.4, P 
ISSN:1323-1316
1440-1819
DOI:10.1111/pcn.13656