The performance of ChatGPT and Bing on a computerized adaptive test of verbal intelligence

We administered a computerized adaptive test of vocabulary three times to assess the verbal intelligence of chatGPT (GPT 3.5) and Bing (based on GPT 4). There was no difference between their performance; both performed at a high level, outperforming approximately 95% of humans and scoring above the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2024-07, Vol.19 (7), p.e0307097
Hauptverfasser:	Klein, Balázs, Kovacs, Kristof
Format:	Artikel
Sprache:	eng
Schlagworte:	Adolescent Adult Algorithms Artificial intelligence Biology and Life Sciences Chatbots Cognitive ability Computer and Information Sciences Evaluation Female Hallucinations Human performance Humans Intelligence Intelligence Tests Item response theory Language Large language models Male Middle Aged Performance evaluation Psychometrics - methods Quantitative psychology Questions Social Sciences Verbal ability Vocabulary Young Adult
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We administered a computerized adaptive test of vocabulary three times to assess the verbal intelligence of chatGPT (GPT 3.5) and Bing (based on GPT 4). There was no difference between their performance; both performed at a high level, outperforming approximately 95% of humans and scoring above the level of native speakers with a doctoral degree. In 42% of test items that were administered more than once these large language models provided different answers to the same question in different sessions. They never engaged in guessing, but provided hallucinations: answers that were not among the options. Such hallucinations were not triggered by the inability to answer correctly as the same questions evoked correct answers in other sessions. The results implicate that psychometric tools developed for humans have limitations when assessing AI, but they also imply that computerised adaptive testing of verbal ability is an appropriate tool to critically evaluate the performance of large language models.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0307097