MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis
Clinical diagnosis is critical in medical practice, typically requiring a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which does not align with the complex mu...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Clinical diagnosis is critical in medical practice, typically requiring a
continuous and evolving process that includes primary diagnosis, differential
diagnosis, and final diagnosis. However, most existing clinical diagnostic
tasks are single-step processes, which does not align with the complex
multi-step diagnostic procedures found in real-world clinical settings. In this
paper, we propose a Chinese clinical diagnostic benchmark, called MSDiagnosis.
This benchmark consists of 2,225 cases from 12 departments, covering tasks such
as primary diagnosis, differential diagnosis, and final diagnosis.
Additionally, we propose a novel and effective framework. This framework
combines forward inference, backward inference, reflection, and refinement,
enabling the large language model to self-evaluate and adjust its diagnostic
results. To this end, we test open-source models, closed-source models, and our
proposed framework.The experimental results demonstrate the effectiveness of
the proposed method. We also provide a comprehensive experimental analysis and
suggest future research directions for this task. |
---|---|
DOI: | 10.48550/arxiv.2408.10039 |