Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
Abstract Background Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standa...
Gespeichert in:
Veröffentlicht in: | European heart journal 2024-10, Vol.45 (Supplement_1) |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
Background
Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standardized tests such as the United States Medical Licensing Examination (USMLE), but little is known about their performance on subspecialty-focused exam questions, particularly related to clinical management and treatment.
Objective
This study aims to evaluate the performance of ChatGPT version 4.0 on the cardiovascular questions from the Medical Knowledge Self-Assessment Program (MKSAP) 19, a widely used resource for board exam preparation in the United States.
Methods
We submitted all cardiovascular questions from MKSAP 19 to ChatGPT 4.0, covering a broad range of cardiology topics in a multiple-choice format. Performance was gauged against both the official MKSAP answer key and average trainee scores obtained from the MKSAP website. Out of 129 questions, 4 were invalidated due to post-publication data, and 18 were excluded due to reliance on visual aids, leaving 107 questions for analysis.
Results
ChatGPT 4.0 correctly answered 93 out of 107 questions, reflecting an 87% accuracy rate, compared to a 60% accuracy rate averaged among all human users for the same questions (p |
---|---|
ISSN: | 0195-668X 1522-9645 |
DOI: | 10.1093/eurheartj/ehae666.3443 |