B - 113 Assessing the Neuropsychology Information Base of Large Language Models
Abstract Objective Research has demonstrated that Large Language Models (LLMs) can obtain passing scores on medical board-certification examinations and have made substantial improvements in recent years (e.g., ChatGPT-4 and ChatGPT-3.5 demonstrating an accuracy of 83.4% and 73.4%, respectively, on...
Gespeichert in:
Veröffentlicht in: | Archives of clinical neuropsychology 2024-10, Vol.39 (7), p.1214-1215 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Abstract
Objective
Research has demonstrated that Large Language Models (LLMs) can obtain passing scores on medical board-certification examinations and have made substantial improvements in recent years (e.g., ChatGPT-4 and ChatGPT-3.5 demonstrating an accuracy of 83.4% and 73.4%, respectively, on neurosurgical practice written board-certification questions). To date, the extent of LLMs’ neuropsychology domain information has not been investigated. This study is an initial exploration of ChatGPT-3.5, ChatGPT-4, and Gemini’s performance on mock clinical neuropsychology written board-certification examination questions.
Methods
Six hundred practice examination questions were obtained from the BRAIN American Academy of Clinical Neuropsychology (AACN) website. Data for specific question domains and pediatric subclassification were available for 300 items. Using an a priori prompting strategy, the questions were input into ChatGPT-3.5, ChatGPT-4, and Gemini. Responses were scored based on BRAIN AACN answer keys. Chi-squared tests assessed LLMs’ performance overall and within domains, and significance was set at p = 0.002 using Bonferroni correction.
Results
Across all six hundred items, ChatGPT-4 had superior accuracy (74%) to ChatGPT-3.5 (62.5%) and Gemini (52.7%; p’s |
---|---|
ISSN: | 1873-5843 1873-5843 |
DOI: | 10.1093/arclin/acae067.274 |