Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study

This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medicine (Baltimore) 2024-05, Vol.103 (22), p.e38352
Hauptverfasser: Ömür Arça, Dilek, Erdemir, İsmail, Kara, Fevzi, Shermatov, Nurgazy, Odacioğlu, Mürüvvet, İbişoğlu, Emel, Hanci, Ferid Baran, Sağiroğlu, Gönül, Hanci, Volkan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 22
container_start_page e38352
container_title Medicine (Baltimore)
container_volume 103
creator Ömür Arça, Dilek
Erdemir, İsmail
Kara, Fevzi
Shermatov, Nurgazy
Odacioğlu, Mürüvvet
İbişoğlu, Emel
Hanci, Ferid Baran
Sağiroğlu, Gönül
Hanci, Volkan
description This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P 
doi_str_mv 10.1097/MD.0000000000038352
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3102882884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3102882884</sourcerecordid><originalsourceid>FETCH-LOGICAL-c230t-b86a8da973dc96260bb211eed8b335dd60ca5dd6cd43cbb0427b8bc4efa419813</originalsourceid><addsrcrecordid>eNpdUctqFUEQbcRgbqJfIEgvXThJv-bl7pJoDCRko-uh-jG5LTPd166ewP04_82eJEawKCgOnHOqikPIe87OOOvb89vLM_avZCdr8YpseC2bqu4b9ZpsGBN11fatOiYniD8Z47IV6g05lr2oe9arDfm9RXSIPtzTvHM0ObCg_eTz4VMBk38BECz9tcAKaBwppOxHbzxM1Ifspsnfu2AcNTvIOuaixX0MxZrm-OjMGaNzxEzRQTI7t7q55AsBdFwyNZCsj_tlmmOAdFgNFjQ-Q_YxfKbbQKNGlx4ecdmKebGHt-RohAndu-d5Sn58_fL94lt1c3d1fbG9qYyQLFe6a6Cz0LfSmr4RDdNacO6c7bSUtbUNM7AOY5U0WjMlWt1po9wIivcdl6fk45PvPsVyNuZh9mjK1xBcXHCQnImuK60KVT5RTYqIyY3DPvm5fDRwNqy5DbeXw_-5FdWH5wWLnp190fwNSv4BHuCZJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3102882884</pqid></control><display><type>article</type><title>Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study</title><source>PubMed (Medline)</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>IngentaConnect Open Access Journals</source><source>Alma/SFX Local Collection</source><source>EZB Electronic Journals Library</source><source>Wolters Kluwer Open Access</source><creator>Ömür Arça, Dilek ; Erdemir, İsmail ; Kara, Fevzi ; Shermatov, Nurgazy ; Odacioğlu, Mürüvvet ; İbişoğlu, Emel ; Hanci, Ferid Baran ; Sağiroğlu, Gönül ; Hanci, Volkan</creator><creatorcontrib>Ömür Arça, Dilek ; Erdemir, İsmail ; Kara, Fevzi ; Shermatov, Nurgazy ; Odacioğlu, Mürüvvet ; İbişoğlu, Emel ; Hanci, Ferid Baran ; Sağiroğlu, Gönül ; Hanci, Volkan</creatorcontrib><description>This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P &lt; .001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.</description><identifier>ISSN: 0025-7974</identifier><identifier>ISSN: 1536-5964</identifier><identifier>EISSN: 1536-5964</identifier><identifier>DOI: 10.1097/MD.0000000000038352</identifier><identifier>PMID: 39259094</identifier><language>eng</language><publisher>United States</publisher><subject>Artificial Intelligence ; Cardiopulmonary Resuscitation - methods ; Cardiopulmonary Resuscitation - standards ; Comprehension ; Cross-Sectional Studies ; Humans ; Reproducibility of Results ; Surveys and Questionnaires</subject><ispartof>Medicine (Baltimore), 2024-05, Vol.103 (22), p.e38352</ispartof><rights>Copyright © 2024 the Author(s). Published by Wolters Kluwer Health, Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c230t-b86a8da973dc96260bb211eed8b335dd60ca5dd6cd43cbb0427b8bc4efa419813</cites><orcidid>0000-0002-7220-0429</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39259094$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ömür Arça, Dilek</creatorcontrib><creatorcontrib>Erdemir, İsmail</creatorcontrib><creatorcontrib>Kara, Fevzi</creatorcontrib><creatorcontrib>Shermatov, Nurgazy</creatorcontrib><creatorcontrib>Odacioğlu, Mürüvvet</creatorcontrib><creatorcontrib>İbişoğlu, Emel</creatorcontrib><creatorcontrib>Hanci, Ferid Baran</creatorcontrib><creatorcontrib>Sağiroğlu, Gönül</creatorcontrib><creatorcontrib>Hanci, Volkan</creatorcontrib><title>Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study</title><title>Medicine (Baltimore)</title><addtitle>Medicine (Baltimore)</addtitle><description>This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P &lt; .001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.</description><subject>Artificial Intelligence</subject><subject>Cardiopulmonary Resuscitation - methods</subject><subject>Cardiopulmonary Resuscitation - standards</subject><subject>Comprehension</subject><subject>Cross-Sectional Studies</subject><subject>Humans</subject><subject>Reproducibility of Results</subject><subject>Surveys and Questionnaires</subject><issn>0025-7974</issn><issn>1536-5964</issn><issn>1536-5964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdUctqFUEQbcRgbqJfIEgvXThJv-bl7pJoDCRko-uh-jG5LTPd166ewP04_82eJEawKCgOnHOqikPIe87OOOvb89vLM_avZCdr8YpseC2bqu4b9ZpsGBN11fatOiYniD8Z47IV6g05lr2oe9arDfm9RXSIPtzTvHM0ObCg_eTz4VMBk38BECz9tcAKaBwppOxHbzxM1Ifspsnfu2AcNTvIOuaixX0MxZrm-OjMGaNzxEzRQTI7t7q55AsBdFwyNZCsj_tlmmOAdFgNFjQ-Q_YxfKbbQKNGlx4ecdmKebGHt-RohAndu-d5Sn58_fL94lt1c3d1fbG9qYyQLFe6a6Cz0LfSmr4RDdNacO6c7bSUtbUNM7AOY5U0WjMlWt1po9wIivcdl6fk45PvPsVyNuZh9mjK1xBcXHCQnImuK60KVT5RTYqIyY3DPvm5fDRwNqy5DbeXw_-5FdWH5wWLnp190fwNSv4BHuCZJg</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>Ömür Arça, Dilek</creator><creator>Erdemir, İsmail</creator><creator>Kara, Fevzi</creator><creator>Shermatov, Nurgazy</creator><creator>Odacioğlu, Mürüvvet</creator><creator>İbişoğlu, Emel</creator><creator>Hanci, Ferid Baran</creator><creator>Sağiroğlu, Gönül</creator><creator>Hanci, Volkan</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7220-0429</orcidid></search><sort><creationdate>20240531</creationdate><title>Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study</title><author>Ömür Arça, Dilek ; Erdemir, İsmail ; Kara, Fevzi ; Shermatov, Nurgazy ; Odacioğlu, Mürüvvet ; İbişoğlu, Emel ; Hanci, Ferid Baran ; Sağiroğlu, Gönül ; Hanci, Volkan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c230t-b86a8da973dc96260bb211eed8b335dd60ca5dd6cd43cbb0427b8bc4efa419813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Cardiopulmonary Resuscitation - methods</topic><topic>Cardiopulmonary Resuscitation - standards</topic><topic>Comprehension</topic><topic>Cross-Sectional Studies</topic><topic>Humans</topic><topic>Reproducibility of Results</topic><topic>Surveys and Questionnaires</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ömür Arça, Dilek</creatorcontrib><creatorcontrib>Erdemir, İsmail</creatorcontrib><creatorcontrib>Kara, Fevzi</creatorcontrib><creatorcontrib>Shermatov, Nurgazy</creatorcontrib><creatorcontrib>Odacioğlu, Mürüvvet</creatorcontrib><creatorcontrib>İbişoğlu, Emel</creatorcontrib><creatorcontrib>Hanci, Ferid Baran</creatorcontrib><creatorcontrib>Sağiroğlu, Gönül</creatorcontrib><creatorcontrib>Hanci, Volkan</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Medicine (Baltimore)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ömür Arça, Dilek</au><au>Erdemir, İsmail</au><au>Kara, Fevzi</au><au>Shermatov, Nurgazy</au><au>Odacioğlu, Mürüvvet</au><au>İbişoğlu, Emel</au><au>Hanci, Ferid Baran</au><au>Sağiroğlu, Gönül</au><au>Hanci, Volkan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study</atitle><jtitle>Medicine (Baltimore)</jtitle><addtitle>Medicine (Baltimore)</addtitle><date>2024-05-31</date><risdate>2024</risdate><volume>103</volume><issue>22</issue><spage>e38352</spage><pages>e38352-</pages><issn>0025-7974</issn><issn>1536-5964</issn><eissn>1536-5964</eissn><abstract>This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P &lt; .001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.</abstract><cop>United States</cop><pmid>39259094</pmid><doi>10.1097/MD.0000000000038352</doi><orcidid>https://orcid.org/0000-0002-7220-0429</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0025-7974
ispartof Medicine (Baltimore), 2024-05, Vol.103 (22), p.e38352
issn 0025-7974
1536-5964
1536-5964
language eng
recordid cdi_proquest_miscellaneous_3102882884
source PubMed (Medline); MEDLINE; DOAJ Directory of Open Access Journals; IngentaConnect Open Access Journals; Alma/SFX Local Collection; EZB Electronic Journals Library; Wolters Kluwer Open Access
subjects Artificial Intelligence
Cardiopulmonary Resuscitation - methods
Cardiopulmonary Resuscitation - standards
Comprehension
Cross-Sectional Studies
Humans
Reproducibility of Results
Surveys and Questionnaires
title Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A26%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Assessing%20the%20readability,%20reliability,%20and%20quality%20of%20artificial%20intelligence%20chatbot%20responses%20to%20the%20100%20most%20searched%20queries%20about%20cardiopulmonary%20resuscitation:%20An%20observational%20study&rft.jtitle=Medicine%20(Baltimore)&rft.au=%C3%96m%C3%BCr%20Ar%C3%A7a,%20Dilek&rft.date=2024-05-31&rft.volume=103&rft.issue=22&rft.spage=e38352&rft.pages=e38352-&rft.issn=0025-7974&rft.eissn=1536-5964&rft_id=info:doi/10.1097/MD.0000000000038352&rft_dat=%3Cproquest_cross%3E3102882884%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3102882884&rft_id=info:pmid/39259094&rfr_iscdi=true