Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment

Purpose Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Canadian Association of Radiologists journal 2024-05, Vol.75 (2), p.344-350
Hauptverfasser: Patil, Nikhil S., Huang, Ryan S., van der Pol, Christian B., Larocque, Natasha
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 350
container_issue 2
container_start_page 344
container_title Canadian Association of Radiologists journal
container_volume 75
creator Patil, Nikhil S.
Huang, Ryan S.
van der Pol, Christian B.
Larocque, Natasha
description Purpose Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions. Methods Text-based questions were collected from the 2017-2021 American College of Radiology’s Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well. Results 318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT’s response length was significantly shorter than Bard’s (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT’s response time was significantly longer than Bard’s (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard’s performance. Conclusion ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question. Graphical Abstract
doi_str_mv 10.1177/08465371231193716
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2851141530</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_08465371231193716</sage_id><sourcerecordid>2851141530</sourcerecordid><originalsourceid>FETCH-LOGICAL-c383t-76365aceeffa3fba172da768dcda96546e0b7f085191e214c0bf4573c356a3693</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EoqXwASzII0uKHSd2MrYVFEQlKlQWlug1fi6pkrjYCdC_J1ULCxLTHd45V3qXkEvOhpwrdcOSSMZC8VBwnnYpj0ifR0kShELyY9Lf3YMd0CNn3q8ZY5FQ6SnpCRWrJInSPnmd2GoDDpriA-kcnbGugjpHag2dvEEznS8o1JqOwWla1BToAr-aYAweNX0GXdjSrrb0sbafJeoV0pH36H2FdXNOTgyUHi8OOSAvd7eLyX0we5o-TEazIBeJaAIlhYwhRzQGhFkCV6EGJROda0hlHElkS2VYEvOUY8ijnC1NFCuRi1iCkKkYkOt978bZ9xZ9k1WFz7EsoUbb-izsVB7xWLAO5Xs0d9Z7hybbuKICt804y3aTZn8m7ZyrQ327rFD_Gj8bdsBwD3hYYba2rau7d_9p_AYW-32O</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2851141530</pqid></control><display><type>article</type><title>Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment</title><source>SAGE Complete A-Z List</source><creator>Patil, Nikhil S. ; Huang, Ryan S. ; van der Pol, Christian B. ; Larocque, Natasha</creator><creatorcontrib>Patil, Nikhil S. ; Huang, Ryan S. ; van der Pol, Christian B. ; Larocque, Natasha</creatorcontrib><description><![CDATA[Purpose Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions. Methods Text-based questions were collected from the 2017-2021 American College of Radiology’s Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well. Results 318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT’s response length was significantly shorter than Bard’s (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT’s response time was significantly longer than Bard’s (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard’s performance. Conclusion ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question. Graphical Abstract]]></description><identifier>ISSN: 0846-5371</identifier><identifier>EISSN: 1488-2361</identifier><identifier>DOI: 10.1177/08465371231193716</identifier><identifier>PMID: 37578849</identifier><language>eng</language><publisher>Los Angeles, CA: SAGE Publications</publisher><ispartof>Canadian Association of Radiologists journal, 2024-05, Vol.75 (2), p.344-350</ispartof><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c383t-76365aceeffa3fba172da768dcda96546e0b7f085191e214c0bf4573c356a3693</citedby><cites>FETCH-LOGICAL-c383t-76365aceeffa3fba172da768dcda96546e0b7f085191e214c0bf4573c356a3693</cites><orcidid>0000-0003-3929-0482 ; 0000-0002-3404-5376 ; 0000-0002-1718-2619 ; 0000-0003-0449-9438</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/08465371231193716$$EPDF$$P50$$Gsage$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/08465371231193716$$EHTML$$P50$$Gsage$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,21799,27903,27904,43600,43601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37578849$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Patil, Nikhil S.</creatorcontrib><creatorcontrib>Huang, Ryan S.</creatorcontrib><creatorcontrib>van der Pol, Christian B.</creatorcontrib><creatorcontrib>Larocque, Natasha</creatorcontrib><title>Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment</title><title>Canadian Association of Radiologists journal</title><addtitle>Can Assoc Radiol J</addtitle><description><![CDATA[Purpose Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions. Methods Text-based questions were collected from the 2017-2021 American College of Radiology’s Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well. Results 318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT’s response length was significantly shorter than Bard’s (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT’s response time was significantly longer than Bard’s (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard’s performance. Conclusion ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question. Graphical Abstract]]></description><issn>0846-5371</issn><issn>1488-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>AFRWT</sourceid><recordid>eNp9kLFOwzAURS0EoqXwASzII0uKHSd2MrYVFEQlKlQWlug1fi6pkrjYCdC_J1ULCxLTHd45V3qXkEvOhpwrdcOSSMZC8VBwnnYpj0ifR0kShELyY9Lf3YMd0CNn3q8ZY5FQ6SnpCRWrJInSPnmd2GoDDpriA-kcnbGugjpHag2dvEEznS8o1JqOwWla1BToAr-aYAweNX0GXdjSrrb0sbafJeoV0pH36H2FdXNOTgyUHi8OOSAvd7eLyX0we5o-TEazIBeJaAIlhYwhRzQGhFkCV6EGJROda0hlHElkS2VYEvOUY8ijnC1NFCuRi1iCkKkYkOt978bZ9xZ9k1WFz7EsoUbb-izsVB7xWLAO5Xs0d9Z7hybbuKICt804y3aTZn8m7ZyrQ327rFD_Gj8bdsBwD3hYYba2rau7d_9p_AYW-32O</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Patil, Nikhil S.</creator><creator>Huang, Ryan S.</creator><creator>van der Pol, Christian B.</creator><creator>Larocque, Natasha</creator><general>SAGE Publications</general><scope>AFRWT</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-3929-0482</orcidid><orcidid>https://orcid.org/0000-0002-3404-5376</orcidid><orcidid>https://orcid.org/0000-0002-1718-2619</orcidid><orcidid>https://orcid.org/0000-0003-0449-9438</orcidid></search><sort><creationdate>20240501</creationdate><title>Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment</title><author>Patil, Nikhil S. ; Huang, Ryan S. ; van der Pol, Christian B. ; Larocque, Natasha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c383t-76365aceeffa3fba172da768dcda96546e0b7f085191e214c0bf4573c356a3693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Patil, Nikhil S.</creatorcontrib><creatorcontrib>Huang, Ryan S.</creatorcontrib><creatorcontrib>van der Pol, Christian B.</creatorcontrib><creatorcontrib>Larocque, Natasha</creatorcontrib><collection>Sage Journals GOLD Open Access 2024</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Canadian Association of Radiologists journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Patil, Nikhil S.</au><au>Huang, Ryan S.</au><au>van der Pol, Christian B.</au><au>Larocque, Natasha</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment</atitle><jtitle>Canadian Association of Radiologists journal</jtitle><addtitle>Can Assoc Radiol J</addtitle><date>2024-05-01</date><risdate>2024</risdate><volume>75</volume><issue>2</issue><spage>344</spage><epage>350</epage><pages>344-350</pages><issn>0846-5371</issn><eissn>1488-2361</eissn><abstract><![CDATA[Purpose Bard by Google, a direct competitor to ChatGPT, was recently released. Understanding the relative performance of these different chatbots can provide important insight into their strengths and weaknesses as well as which roles they are most suited to fill. In this project, we aimed to compare the most recent version of ChatGPT, ChatGPT-4, and Bard by Google, in their ability to accurately respond to radiology board examination practice questions. Methods Text-based questions were collected from the 2017-2021 American College of Radiology’s Diagnostic Radiology In-Training (DXIT) examinations. ChatGPT-4 and Bard were queried, and their comparative accuracies, response lengths, and response times were documented. Subspecialty-specific performance was analyzed as well. Results 318 questions were included in our analysis. ChatGPT answered significantly more accurately than Bard (87.11% vs 70.44%, P < .0001). ChatGPT’s response length was significantly shorter than Bard’s (935.28 ± 440.88 characters vs 1437.52 ± 415.91 characters, P < .0001). ChatGPT’s response time was significantly longer than Bard’s (26.79 ± 3.27 seconds vs 7.55 ± 1.88 seconds, P < .0001). ChatGPT performed superiorly to Bard in neuroradiology, (100.00% vs 86.21%, P = .03), general & physics (85.39% vs 68.54%, P < .001), nuclear medicine (80.00% vs 56.67%, P < .01), pediatric radiology (93.75% vs 68.75%, P = .03), and ultrasound (100.00% vs 63.64%, P < .001). In the remaining subspecialties, there were no significant differences between ChatGPT and Bard’s performance. Conclusion ChatGPT displayed superior radiology knowledge compared to Bard. While both chatbots display reasonable radiology knowledge, they should be used with conscious knowledge of their limitations and fallibility. Both chatbots provided incorrect or illogical answer explanations and did not always address the educational content of the question. Graphical Abstract]]></abstract><cop>Los Angeles, CA</cop><pub>SAGE Publications</pub><pmid>37578849</pmid><doi>10.1177/08465371231193716</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-3929-0482</orcidid><orcidid>https://orcid.org/0000-0002-3404-5376</orcidid><orcidid>https://orcid.org/0000-0002-1718-2619</orcidid><orcidid>https://orcid.org/0000-0003-0449-9438</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0846-5371
ispartof Canadian Association of Radiologists journal, 2024-05, Vol.75 (2), p.344-350
issn 0846-5371
1488-2361
language eng
recordid cdi_proquest_miscellaneous_2851141530
source SAGE Complete A-Z List
title Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T01%3A04%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparative%20Performance%20of%20ChatGPT%20and%20Bard%20in%20a%20Text-Based%20Radiology%20Knowledge%20Assessment&rft.jtitle=Canadian%20Association%20of%20Radiologists%20journal&rft.au=Patil,%20Nikhil%20S.&rft.date=2024-05-01&rft.volume=75&rft.issue=2&rft.spage=344&rft.epage=350&rft.pages=344-350&rft.issn=0846-5371&rft.eissn=1488-2361&rft_id=info:doi/10.1177/08465371231193716&rft_dat=%3Cproquest_cross%3E2851141530%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2851141530&rft_id=info:pmid/37578849&rft_sage_id=10.1177_08465371231193716&rfr_iscdi=true