The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance

The integration of artificial intelligence (AI) chatbots like Google's Bard, OpenAI's ChatGPT, and Microsoft's Bing Chatbot into academic and professional domains, including cardiology, has been rapidly evolving. Their application in educational and research frameworks, however, raise...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Annals of medicine and surgery 2024-06, Vol.86 (6), p.3261-3266
Hauptverfasser:	Altamimi, Ibraheem, Alhumimidi, Abdullah, Alshehri, Salem, Alrumayan, Abdullah, Al-Khlaiwi, Thamir, Meo, Sultan A, Temsah, Mohamad-Hani
Format:	Artikel
Sprache:	eng
Schlagworte:	Original Research
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The integration of artificial intelligence (AI) chatbots like Google's Bard, OpenAI's ChatGPT, and Microsoft's Bing Chatbot into academic and professional domains, including cardiology, has been rapidly evolving. Their application in educational and research frameworks, however, raises questions about their efficacy, particularly in specialized fields like cardiology. This study aims to evaluate the knowledge depth and accuracy of these AI chatbots in cardiology using a multiple-choice question (MCQ) format. The study was conducted as an exploratory, cross-sectional study in November 2023 on a bank of 100 MCQs covering various cardiology topics that was created from authoritative textbooks and question banks. These MCQs were then used to assess the knowledge level of Google's Bard, Microsoft Bing, and ChatGPT 4.0. Each question was entered manually into the chatbots, ensuring no memory retention bias. The study found that ChatGPT 4.0 demonstrated the highest knowledge score in cardiology, with 87% accuracy, followed by Bing at 60% and Bard at 46%. The performance varied across different cardiology subtopics, with ChatGPT consistently outperforming the others. Notably, the study revealed significant differences in the proficiency of these chatbots in specific cardiology domains. This study highlights a spectrum of efficacy among AI chatbots in disseminating cardiology knowledge. ChatGPT 4.0 emerged as a potential auxiliary educational resource in cardiology, surpassing traditional learning methods in some aspects. However, the variability in performance among these AI systems underscores the need for cautious evaluation and continuous improvement, especially for chatbots like Bard, to ensure reliability and accuracy in medical knowledge dissemination.
ISSN:	2049-0801 2049-0801
DOI:	10.1097/MS9.0000000000002120