Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?

Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randoml...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The American journal of emergency medicine 2024-05, Vol.79, p.44-47
Hauptverfasser: Zaboli, Arian, Brigo, Francesco, Sibilio, Serena, Mian, Michael, Turcato, Gianni
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 47
container_issue
container_start_page 44
container_title The American journal of emergency medicine
container_volume 79
creator Zaboli, Arian
Brigo, Francesco
Sibilio, Serena
Mian, Michael
Turcato, Gianni
description Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231–0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757–1.000) compared to only 0.669 (0.153–1.000) for Chat-GPT. The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.
doi_str_mv 10.1016/j.ajem.2024.02.008
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2925484195</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0735675724000664</els_id><sourcerecordid>2925484195</sourcerecordid><originalsourceid>FETCH-LOGICAL-c384t-ee646190d1df5560345457a84d6819ca5e733c5061b2e3c1a99f4eb97180f70b3</originalsourceid><addsrcrecordid>eNp9kUuLFDEUhYMoTjv6B1xIwI2bKvNOSoRBGp0RBnQxrkMqdasnRT3aJDXS_94UPbpw4epuvnM49xyEXlNSU0LV-6F2A0w1I0zUhNWEmCdoRyVnlaGaPkU7ormslJb6Ar1IaSCEUiHFc3TBDRe0afgOdTfr5GYc5gzjGA4we8APENOa8P7e5er6-90H_Ot-wUeI_RKnhFvIGWJRYL_ECD6PJ-xHl1LoT2E-4KPLAeacNiLH4A5w9RI9692Y4NXjvUQ_vny-299Ut9-uv-4_3VaeG5ErACUUbUhHu15KRXgJK7UzolOGNt5J0Jx7SRRtGXBPXdP0AtpGU0N6TVp-id6dfY9x-blCynYKyZfH3AzLmixrmBSmfC4L-vYfdFjWOJd0lhOuDRFKmUKxM-XjklKE3h5jmFw8WUrstoEd7LaB3TawhNmyQRG9ebRe2wm6v5I_pRfg4xmA0sVDgGiTD1vzXdj6tN0S_uf_G8-clxw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3037804668</pqid></control><display><type>article</type><title>Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Zaboli, Arian ; Brigo, Francesco ; Sibilio, Serena ; Mian, Michael ; Turcato, Gianni</creator><creatorcontrib>Zaboli, Arian ; Brigo, Francesco ; Sibilio, Serena ; Mian, Michael ; Turcato, Gianni</creatorcontrib><description>Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231–0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757–1.000) compared to only 0.669 (0.153–1.000) for Chat-GPT. The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.</description><identifier>ISSN: 0735-6757</identifier><identifier>ISSN: 1532-8171</identifier><identifier>EISSN: 1532-8171</identifier><identifier>DOI: 10.1016/j.ajem.2024.02.008</identifier><identifier>PMID: 38341993</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Advanced nurse practice ; Artificial intelligence ; ChatGPT ; Codes ; Emergency medical care ; Emergency Service, Hospital ; Hospitalization ; Humans ; Kappa coefficient ; Manchester triage system ; Medical personnel ; Mortality ; Nurses ; Nursing ; Patient safety ; Patients ; Performance evaluation ; Physicians ; Reproducibility of Results ; Triage</subject><ispartof>The American journal of emergency medicine, 2024-05, Vol.79, p.44-47</ispartof><rights>2024 Elsevier Inc.</rights><rights>Copyright © 2024 Elsevier Inc. All rights reserved.</rights><rights>2024. Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c384t-ee646190d1df5560345457a84d6819ca5e733c5061b2e3c1a99f4eb97180f70b3</citedby><cites>FETCH-LOGICAL-c384t-ee646190d1df5560345457a84d6819ca5e733c5061b2e3c1a99f4eb97180f70b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0735675724000664$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38341993$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zaboli, Arian</creatorcontrib><creatorcontrib>Brigo, Francesco</creatorcontrib><creatorcontrib>Sibilio, Serena</creatorcontrib><creatorcontrib>Mian, Michael</creatorcontrib><creatorcontrib>Turcato, Gianni</creatorcontrib><title>Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?</title><title>The American journal of emergency medicine</title><addtitle>Am J Emerg Med</addtitle><description>Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231–0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757–1.000) compared to only 0.669 (0.153–1.000) for Chat-GPT. The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.</description><subject>Advanced nurse practice</subject><subject>Artificial intelligence</subject><subject>ChatGPT</subject><subject>Codes</subject><subject>Emergency medical care</subject><subject>Emergency Service, Hospital</subject><subject>Hospitalization</subject><subject>Humans</subject><subject>Kappa coefficient</subject><subject>Manchester triage system</subject><subject>Medical personnel</subject><subject>Mortality</subject><subject>Nurses</subject><subject>Nursing</subject><subject>Patient safety</subject><subject>Patients</subject><subject>Performance evaluation</subject><subject>Physicians</subject><subject>Reproducibility of Results</subject><subject>Triage</subject><issn>0735-6757</issn><issn>1532-8171</issn><issn>1532-8171</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kUuLFDEUhYMoTjv6B1xIwI2bKvNOSoRBGp0RBnQxrkMqdasnRT3aJDXS_94UPbpw4epuvnM49xyEXlNSU0LV-6F2A0w1I0zUhNWEmCdoRyVnlaGaPkU7ormslJb6Ar1IaSCEUiHFc3TBDRe0afgOdTfr5GYc5gzjGA4we8APENOa8P7e5er6-90H_Ot-wUeI_RKnhFvIGWJRYL_ECD6PJ-xHl1LoT2E-4KPLAeacNiLH4A5w9RI9692Y4NXjvUQ_vny-299Ut9-uv-4_3VaeG5ErACUUbUhHu15KRXgJK7UzolOGNt5J0Jx7SRRtGXBPXdP0AtpGU0N6TVp-id6dfY9x-blCynYKyZfH3AzLmixrmBSmfC4L-vYfdFjWOJd0lhOuDRFKmUKxM-XjklKE3h5jmFw8WUrstoEd7LaB3TawhNmyQRG9ebRe2wm6v5I_pRfg4xmA0sVDgGiTD1vzXdj6tN0S_uf_G8-clxw</recordid><startdate>202405</startdate><enddate>202405</enddate><creator>Zaboli, Arian</creator><creator>Brigo, Francesco</creator><creator>Sibilio, Serena</creator><creator>Mian, Michael</creator><creator>Turcato, Gianni</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7T5</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>202405</creationdate><title>Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?</title><author>Zaboli, Arian ; Brigo, Francesco ; Sibilio, Serena ; Mian, Michael ; Turcato, Gianni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c384t-ee646190d1df5560345457a84d6819ca5e733c5061b2e3c1a99f4eb97180f70b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced nurse practice</topic><topic>Artificial intelligence</topic><topic>ChatGPT</topic><topic>Codes</topic><topic>Emergency medical care</topic><topic>Emergency Service, Hospital</topic><topic>Hospitalization</topic><topic>Humans</topic><topic>Kappa coefficient</topic><topic>Manchester triage system</topic><topic>Medical personnel</topic><topic>Mortality</topic><topic>Nurses</topic><topic>Nursing</topic><topic>Patient safety</topic><topic>Patients</topic><topic>Performance evaluation</topic><topic>Physicians</topic><topic>Reproducibility of Results</topic><topic>Triage</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zaboli, Arian</creatorcontrib><creatorcontrib>Brigo, Francesco</creatorcontrib><creatorcontrib>Sibilio, Serena</creatorcontrib><creatorcontrib>Mian, Michael</creatorcontrib><creatorcontrib>Turcato, Gianni</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Immunology Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>The American journal of emergency medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zaboli, Arian</au><au>Brigo, Francesco</au><au>Sibilio, Serena</au><au>Mian, Michael</au><au>Turcato, Gianni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?</atitle><jtitle>The American journal of emergency medicine</jtitle><addtitle>Am J Emerg Med</addtitle><date>2024-05</date><risdate>2024</risdate><volume>79</volume><spage>44</spage><epage>47</epage><pages>44-47</pages><issn>0735-6757</issn><issn>1532-8171</issn><eissn>1532-8171</eissn><abstract>Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231–0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757–1.000) compared to only 0.669 (0.153–1.000) for Chat-GPT. The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>38341993</pmid><doi>10.1016/j.ajem.2024.02.008</doi><tpages>4</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0735-6757
ispartof The American journal of emergency medicine, 2024-05, Vol.79, p.44-47
issn 0735-6757
1532-8171
1532-8171
language eng
recordid cdi_proquest_miscellaneous_2925484195
source MEDLINE; Elsevier ScienceDirect Journals
subjects Advanced nurse practice
Artificial intelligence
ChatGPT
Codes
Emergency medical care
Emergency Service, Hospital
Hospitalization
Humans
Kappa coefficient
Manchester triage system
Medical personnel
Mortality
Nurses
Nursing
Patient safety
Patients
Performance evaluation
Physicians
Reproducibility of Results
Triage
title Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T00%3A55%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Human%20intelligence%20versus%20Chat-GPT:%20who%20performs%20better%20in%20correctly%20classifying%20patients%20in%20triage?&rft.jtitle=The%20American%20journal%20of%20emergency%20medicine&rft.au=Zaboli,%20Arian&rft.date=2024-05&rft.volume=79&rft.spage=44&rft.epage=47&rft.pages=44-47&rft.issn=0735-6757&rft.eissn=1532-8171&rft_id=info:doi/10.1016/j.ajem.2024.02.008&rft_dat=%3Cproquest_cross%3E2925484195%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3037804668&rft_id=info:pmid/38341993&rft_els_id=S0735675724000664&rfr_iscdi=true