Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists

Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Psychiatry and clinical neurosciences 2024-06, Vol.78 (6), p.347-352
Hauptverfasser: Li, Dian‐Jeng, Kao, Yu‐Chen, Tsai, Shih‐Jen, Bai, Ya‐Mei, Yeh, Ta‐Chuan, Chu, Che‐Sheng, Hsu, Chih‐Wei, Cheng, Szu‐Wei, Hsu, Tien‐Wei, Liang, Chih‐Sung, Su, Kuan‐Pin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 352
container_issue 6
container_start_page 347
container_title Psychiatry and clinical neurosciences
container_volume 78
creator Li, Dian‐Jeng
Kao, Yu‐Chen
Tsai, Shih‐Jen
Bai, Ya‐Mei
Yeh, Ta‐Chuan
Chu, Che‐Sheng
Hsu, Chih‐Wei
Cheng, Szu‐Wei
Hsu, Tien‐Wei
Liang, Chih‐Sung
Su, Kuan‐Pin
description Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result Only GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology & Epidemiology’ (χ2 = 22.4, P 
doi_str_mv 10.1111/pcn.13656
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2932025580</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063316866</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3536-3f6b06d0224abcd75a72095b566bd37eabfbf905c443462db639b81b45691f873</originalsourceid><addsrcrecordid>eNp1kU1u1DAYhiMEoqWw4ALIEhuQmtZ_8SRLiPonjWAWwzr67NgdV4kdbEfD7DgCV-FKnKTuTOkCCUuWP1vP-8jSWxRvCT4jeZ1Pyp0RJirxrDgmnOOS1KR5nmdGWUkYEUfFqxjvMMaMCfKyOGI1x5zy5rj43fpxgmDdLUobjSYdjA8jOKWRN6jdQLparVHef37-4qfoM4T-FIHr0XKAEfIjRdbto2uwW3BoFXdqYyEFq9DSKu3ig_viB4zWQbLe7dM501tjdNAuWRjyBW6djzairU0bNM5Dslme40kHND05Y4qvixcGhqjfPJ4nxbfLi3V7XS6_Xt20n5alYhUTJTNCYtFjSjlI1S8qWFDcVLISQvZsoUEaaRpcKc4ZF7SXgjWyJpJXoiGmXrCT4sPBOwX_fdYxdaONSg8DOO3n2NGGUUyrqsYZff8Peufn4PLvOoYFywXUQmTq44FSwccYtOmmYEcIu47g7qHHLvfY7XvM7LtH4yxH3T-Rf4vLwPkB2NpB7_5v6lbtl4PyHgcfqvM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063316866</pqid></control><display><type>article</type><title>Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists</title><source>Open Access Titles of Japan</source><source>Wiley Online Library All Journals</source><creator>Li, Dian‐Jeng ; Kao, Yu‐Chen ; Tsai, Shih‐Jen ; Bai, Ya‐Mei ; Yeh, Ta‐Chuan ; Chu, Che‐Sheng ; Hsu, Chih‐Wei ; Cheng, Szu‐Wei ; Hsu, Tien‐Wei ; Liang, Chih‐Sung ; Su, Kuan‐Pin</creator><creatorcontrib>Li, Dian‐Jeng ; Kao, Yu‐Chen ; Tsai, Shih‐Jen ; Bai, Ya‐Mei ; Yeh, Ta‐Chuan ; Chu, Che‐Sheng ; Hsu, Chih‐Wei ; Cheng, Szu‐Wei ; Hsu, Tien‐Wei ; Liang, Chih‐Sung ; Su, Kuan‐Pin</creatorcontrib><description>Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result Only GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology &amp; Epidemiology’ (χ2 = 22.4, P &lt; 0.001) and ‘Psychopharmacology &amp; Other therapies’ (χ2 = 15.8, P &lt; 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT‐4 (5), Bard (3), and Llama‐2 (1). Conclusion Compared to Bard and Llama‐2, GPT‐4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT‐4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT‐4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.</description><identifier>ISSN: 1323-1316</identifier><identifier>EISSN: 1440-1819</identifier><identifier>DOI: 10.1111/pcn.13656</identifier><identifier>PMID: 38404249</identifier><language>eng</language><publisher>Melbourne: John Wiley &amp; Sons Australia, Ltd</publisher><subject>chatbot ; Chatbots ; ChatGPT ; Differential diagnosis ; differential diagnosis in psychiatry ; Epidemiology ; Licenses ; Licensing examinations ; psychiatric application ; Psychiatrists ; Taiwanese psychiatric licensing examination</subject><ispartof>Psychiatry and clinical neurosciences, 2024-06, Vol.78 (6), p.347-352</ispartof><rights>2024 The Authors. Psychiatry and Clinical Neurosciences © 2024 Japanese Society of Psychiatry and Neurology.</rights><rights>2024 Japanese Society of Psychiatry and Neurology</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3536-3f6b06d0224abcd75a72095b566bd37eabfbf905c443462db639b81b45691f873</citedby><cites>FETCH-LOGICAL-c3536-3f6b06d0224abcd75a72095b566bd37eabfbf905c443462db639b81b45691f873</cites><orcidid>0000-0003-4136-1251 ; 0000-0002-8650-4060 ; 0000-0002-1016-3634 ; 0000-0002-4460-5517 ; 0000-0002-6036-045X ; 0000-0002-9987-022X ; 0000-0002-6911-1839 ; 0000-0003-3779-9074 ; 0000-0003-1138-5586 ; 0000-0002-4501-2502</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fpcn.13656$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fpcn.13656$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>315,781,785,1418,27929,27930,45579,45580</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38404249$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Dian‐Jeng</creatorcontrib><creatorcontrib>Kao, Yu‐Chen</creatorcontrib><creatorcontrib>Tsai, Shih‐Jen</creatorcontrib><creatorcontrib>Bai, Ya‐Mei</creatorcontrib><creatorcontrib>Yeh, Ta‐Chuan</creatorcontrib><creatorcontrib>Chu, Che‐Sheng</creatorcontrib><creatorcontrib>Hsu, Chih‐Wei</creatorcontrib><creatorcontrib>Cheng, Szu‐Wei</creatorcontrib><creatorcontrib>Hsu, Tien‐Wei</creatorcontrib><creatorcontrib>Liang, Chih‐Sung</creatorcontrib><creatorcontrib>Su, Kuan‐Pin</creatorcontrib><title>Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists</title><title>Psychiatry and clinical neurosciences</title><addtitle>Psychiatry Clin Neurosci</addtitle><description>Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result Only GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology &amp; Epidemiology’ (χ2 = 22.4, P &lt; 0.001) and ‘Psychopharmacology &amp; Other therapies’ (χ2 = 15.8, P &lt; 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT‐4 (5), Bard (3), and Llama‐2 (1). Conclusion Compared to Bard and Llama‐2, GPT‐4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT‐4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT‐4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.</description><subject>chatbot</subject><subject>Chatbots</subject><subject>ChatGPT</subject><subject>Differential diagnosis</subject><subject>differential diagnosis in psychiatry</subject><subject>Epidemiology</subject><subject>Licenses</subject><subject>Licensing examinations</subject><subject>psychiatric application</subject><subject>Psychiatrists</subject><subject>Taiwanese psychiatric licensing examination</subject><issn>1323-1316</issn><issn>1440-1819</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kU1u1DAYhiMEoqWw4ALIEhuQmtZ_8SRLiPonjWAWwzr67NgdV4kdbEfD7DgCV-FKnKTuTOkCCUuWP1vP-8jSWxRvCT4jeZ1Pyp0RJirxrDgmnOOS1KR5nmdGWUkYEUfFqxjvMMaMCfKyOGI1x5zy5rj43fpxgmDdLUobjSYdjA8jOKWRN6jdQLparVHef37-4qfoM4T-FIHr0XKAEfIjRdbto2uwW3BoFXdqYyEFq9DSKu3ig_viB4zWQbLe7dM501tjdNAuWRjyBW6djzairU0bNM5Dslme40kHND05Y4qvixcGhqjfPJ4nxbfLi3V7XS6_Xt20n5alYhUTJTNCYtFjSjlI1S8qWFDcVLISQvZsoUEaaRpcKc4ZF7SXgjWyJpJXoiGmXrCT4sPBOwX_fdYxdaONSg8DOO3n2NGGUUyrqsYZff8Peufn4PLvOoYFywXUQmTq44FSwccYtOmmYEcIu47g7qHHLvfY7XvM7LtH4yxH3T-Rf4vLwPkB2NpB7_5v6lbtl4PyHgcfqvM</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Li, Dian‐Jeng</creator><creator>Kao, Yu‐Chen</creator><creator>Tsai, Shih‐Jen</creator><creator>Bai, Ya‐Mei</creator><creator>Yeh, Ta‐Chuan</creator><creator>Chu, Che‐Sheng</creator><creator>Hsu, Chih‐Wei</creator><creator>Cheng, Szu‐Wei</creator><creator>Hsu, Tien‐Wei</creator><creator>Liang, Chih‐Sung</creator><creator>Su, Kuan‐Pin</creator><general>John Wiley &amp; Sons Australia, Ltd</general><general>Wiley Subscription Services, Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TK</scope><scope>K9.</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4136-1251</orcidid><orcidid>https://orcid.org/0000-0002-8650-4060</orcidid><orcidid>https://orcid.org/0000-0002-1016-3634</orcidid><orcidid>https://orcid.org/0000-0002-4460-5517</orcidid><orcidid>https://orcid.org/0000-0002-6036-045X</orcidid><orcidid>https://orcid.org/0000-0002-9987-022X</orcidid><orcidid>https://orcid.org/0000-0002-6911-1839</orcidid><orcidid>https://orcid.org/0000-0003-3779-9074</orcidid><orcidid>https://orcid.org/0000-0003-1138-5586</orcidid><orcidid>https://orcid.org/0000-0002-4501-2502</orcidid></search><sort><creationdate>202406</creationdate><title>Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists</title><author>Li, Dian‐Jeng ; Kao, Yu‐Chen ; Tsai, Shih‐Jen ; Bai, Ya‐Mei ; Yeh, Ta‐Chuan ; Chu, Che‐Sheng ; Hsu, Chih‐Wei ; Cheng, Szu‐Wei ; Hsu, Tien‐Wei ; Liang, Chih‐Sung ; Su, Kuan‐Pin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3536-3f6b06d0224abcd75a72095b566bd37eabfbf905c443462db639b81b45691f873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>chatbot</topic><topic>Chatbots</topic><topic>ChatGPT</topic><topic>Differential diagnosis</topic><topic>differential diagnosis in psychiatry</topic><topic>Epidemiology</topic><topic>Licenses</topic><topic>Licensing examinations</topic><topic>psychiatric application</topic><topic>Psychiatrists</topic><topic>Taiwanese psychiatric licensing examination</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Dian‐Jeng</creatorcontrib><creatorcontrib>Kao, Yu‐Chen</creatorcontrib><creatorcontrib>Tsai, Shih‐Jen</creatorcontrib><creatorcontrib>Bai, Ya‐Mei</creatorcontrib><creatorcontrib>Yeh, Ta‐Chuan</creatorcontrib><creatorcontrib>Chu, Che‐Sheng</creatorcontrib><creatorcontrib>Hsu, Chih‐Wei</creatorcontrib><creatorcontrib>Cheng, Szu‐Wei</creatorcontrib><creatorcontrib>Hsu, Tien‐Wei</creatorcontrib><creatorcontrib>Liang, Chih‐Sung</creatorcontrib><creatorcontrib>Su, Kuan‐Pin</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Neurosciences Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Psychiatry and clinical neurosciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Dian‐Jeng</au><au>Kao, Yu‐Chen</au><au>Tsai, Shih‐Jen</au><au>Bai, Ya‐Mei</au><au>Yeh, Ta‐Chuan</au><au>Chu, Che‐Sheng</au><au>Hsu, Chih‐Wei</au><au>Cheng, Szu‐Wei</au><au>Hsu, Tien‐Wei</au><au>Liang, Chih‐Sung</au><au>Su, Kuan‐Pin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists</atitle><jtitle>Psychiatry and clinical neurosciences</jtitle><addtitle>Psychiatry Clin Neurosci</addtitle><date>2024-06</date><risdate>2024</risdate><volume>78</volume><issue>6</issue><spage>347</spage><epage>352</epage><pages>347-352</pages><issn>1323-1316</issn><eissn>1440-1819</eissn><abstract>Aim Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied. Method In the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis. Result Only GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology &amp; Epidemiology’ (χ2 = 22.4, P &lt; 0.001) and ‘Psychopharmacology &amp; Other therapies’ (χ2 = 15.8, P &lt; 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT‐4 (5), Bard (3), and Llama‐2 (1). Conclusion Compared to Bard and Llama‐2, GPT‐4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT‐4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT‐4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.</abstract><cop>Melbourne</cop><pub>John Wiley &amp; Sons Australia, Ltd</pub><pmid>38404249</pmid><doi>10.1111/pcn.13656</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-4136-1251</orcidid><orcidid>https://orcid.org/0000-0002-8650-4060</orcidid><orcidid>https://orcid.org/0000-0002-1016-3634</orcidid><orcidid>https://orcid.org/0000-0002-4460-5517</orcidid><orcidid>https://orcid.org/0000-0002-6036-045X</orcidid><orcidid>https://orcid.org/0000-0002-9987-022X</orcidid><orcidid>https://orcid.org/0000-0002-6911-1839</orcidid><orcidid>https://orcid.org/0000-0003-3779-9074</orcidid><orcidid>https://orcid.org/0000-0003-1138-5586</orcidid><orcidid>https://orcid.org/0000-0002-4501-2502</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1323-1316
ispartof Psychiatry and clinical neurosciences, 2024-06, Vol.78 (6), p.347-352
issn 1323-1316
1440-1819
language eng
recordid cdi_proquest_miscellaneous_2932025580
source Open Access Titles of Japan; Wiley Online Library All Journals
subjects chatbot
Chatbots
ChatGPT
Differential diagnosis
differential diagnosis in psychiatry
Epidemiology
Licenses
Licensing examinations
psychiatric application
Psychiatrists
Taiwanese psychiatric licensing examination
title Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T14%3A37%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparing%20the%20performance%20of%20ChatGPT%20GPT%E2%80%904,%20Bard,%20and%20Llama%E2%80%902%20in%20the%20Taiwan%20Psychiatric%20Licensing%20Examination%20and%20in%20differential%20diagnosis%20with%20multi%E2%80%90center%20psychiatrists&rft.jtitle=Psychiatry%20and%20clinical%20neurosciences&rft.au=Li,%20Dian%E2%80%90Jeng&rft.date=2024-06&rft.volume=78&rft.issue=6&rft.spage=347&rft.epage=352&rft.pages=347-352&rft.issn=1323-1316&rft.eissn=1440-1819&rft_id=info:doi/10.1111/pcn.13656&rft_dat=%3Cproquest_cross%3E3063316866%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063316866&rft_id=info:pmid/38404249&rfr_iscdi=true