Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE)

•Novel use of image analysis in artificial performance on the OITE.•AI performed equally on questions with and without imaging components.•Performance dropped when using image descriptions generated by AI.•AI answered correctly at nearly double the rate of random guessing (49%).•AI performance was w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of surgical education 2024-11, Vol.81 (11), p.1645-1649
Hauptverfasser: Hayes, Daniel S., Foster, Brian K., Makar, Gabriel, Manzar, Shahid, Ozdag, Yagiz, Shultz, Mason, Klena, Joel C., Grandizio, Louis C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Novel use of image analysis in artificial performance on the OITE.•AI performed equally on questions with and without imaging components.•Performance dropped when using image descriptions generated by AI.•AI answered correctly at nearly double the rate of random guessing (49%).•AI performance was worse than all resident classes on the OITE. Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist. Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study. ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents. ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education.
ISSN:1931-7204
1878-7452
1878-7452
DOI:10.1016/j.jsurg.2024.08.002