ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions
Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms. This study aims to evaluate ChatGPT's performance in an...
Gespeichert in:
Veröffentlicht in: | JMIR medical education 2023-12, Vol.9, p.e49183-e49183 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms.
This study aims to evaluate ChatGPT's performance in answering clinical case-based questions in otorhinolaryngology (ORL) in comparison to ORL consultants' answers.
We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs.
Ratings in all categories were significantly higher for ORL consultants (P |
---|---|
ISSN: | 2369-3762 2369-3762 |
DOI: | 10.2196/49183 |