Benchmarking Medical LLMs on Anesthesiology: A Comprehensive Dataset in Chinese

With the recent success of large language models (LLMs), interest in developing them for medical domains has increased. However, due to the lack of benchmark datasets, evaluating the capabilities of medical LLMs remains challenging, particularly in highly specialized fields such as anesthesiology. T...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on emerging topics in computational intelligence 2025-01, p.1-15
Hauptverfasser: Zhou, Bohao, Zhan, Yibing, Wang, Zhonghai, Li, Yanhong, Zhang, Chong, Yu, Baosheng, Ding, Liang, Jin, Hua, Liu, Weifeng, Wang, Xiongbin, Tao, Dapeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the recent success of large language models (LLMs), interest in developing them for medical domains has increased. However, due to the lack of benchmark datasets, evaluating the capabilities of medical LLMs remains challenging, particularly in highly specialized fields such as anesthesiology. To address this gap, we introduce a comprehensive anesthesiology benchmark dataset in Chinese, known as the Chinese Anesthesiology Benchmark (CAB). This benchmark facilitates the evaluation of medical LLMs for anesthesiology across three crucial dimensions: knowledge, application, and safety. Specifically, the CAB provides more than 8 k questions collected from examinations and books for knowledge-level evaluation; more than 2 k questions collected from online anesthesia consultations and hospitals for application-level evaluation; and 136 tests from seven anesthesia medical care scenarios for safety-level evaluation. With the proposed CAB dataset, we conducted a thorough evaluation of six medical LLMs, such as Bianque-2 and HuatuoGPT-13B, and eleven general LLMs, such as Qwen-7B-Chat and GPT-4. The evaluation results revealed that there are still clear gaps in the capacities of medical LLMs for anesthesiology compared with those of medical students in the field of anesthesia. We hope that the proposed CAB dataset can facilitate the development of medical LLMs for anesthesiology.
ISSN:2471-285X
2471-285X
DOI:10.1109/TETCI.2024.3502465