ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions

Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms. This study aims to evaluate ChatGPT's performance in an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	JMIR medical education 2023-12, Vol.9, p.e49183-e49183
Hauptverfasser:	Buhr, Christoph Raphael, Smith, Harry, Huppertz, Tilman, Bahr-Hamm, Katharina, Matthias, Christoph, Blaikie, Andrew, Kelsey, Tom, Kuhn, Sebastian, Eckrich, Jonas
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Chatbots Consultants Language Likert scale Mann-Whitney U test Multimedia Natural language processing Neural networks Original Paper Otolaryngology Ratings & rankings Search engines Software Telemedicine
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms. This study aims to evaluate ChatGPT's performance in answering clinical case-based questions in otorhinolaryngology (ORL) in comparison to ORL consultants' answers. We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Ratings in all categories were significantly higher for ORL consultants (P
ISSN:	2369-3762 2369-3762
DOI:	10.2196/49183