Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program

Abstract Background Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:European heart journal 2024-10, Vol.45 (Supplement_1)
Hauptverfasser: Malkani, K, Zhang, R, Zhao, A, Jain, R, Collins, G P, Parker, M, Maizes, D, Kini, V
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Background Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standardized tests such as the United States Medical Licensing Examination (USMLE), but little is known about their performance on subspecialty-focused exam questions, particularly related to clinical management and treatment. Objective This study aims to evaluate the performance of ChatGPT version 4.0 on the cardiovascular questions from the Medical Knowledge Self-Assessment Program (MKSAP) 19, a widely used resource for board exam preparation in the United States. Methods We submitted all cardiovascular questions from MKSAP 19 to ChatGPT 4.0, covering a broad range of cardiology topics in a multiple-choice format. Performance was gauged against both the official MKSAP answer key and average trainee scores obtained from the MKSAP website. Out of 129 questions, 4 were invalidated due to post-publication data, and 18 were excluded due to reliance on visual aids, leaving 107 questions for analysis. Results ChatGPT 4.0 correctly answered 93 out of 107 questions, reflecting an 87% accuracy rate, compared to a 60% accuracy rate averaged among all human users for the same questions (p
ISSN:0195-668X
1522-9645
DOI:10.1093/eurheartj/ehae666.3443