Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigatin...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background: Cognitive biases in clinical decision-making significantly
contribute to errors in diagnosis and suboptimal patient outcomes. Addressing
these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in
mitigating these biases through the utilization of a multi-agent framework. We
simulate the clinical decision-making processes through multi-agent
conversation and evaluate its efficacy in improving diagnostic accuracy.
Methods: A total of 16 published and unpublished case reports where cognitive
biases have resulted in misdiagnoses were identified from the literature. In
the multi-agent framework, we leveraged GPT-4 to facilitate interactions among
four simulated agents to replicate clinical team dynamics. Each agent has a
distinct role: 1) To make the final diagnosis after considering the
discussions, 2) The devil's advocate and correct confirmation and anchoring
bias, 3) The tutor and facilitator of the discussion to reduce premature
closure bias, and 4) To record and summarize the findings. A total of 80
simulations were evaluated for the accuracy of initial diagnosis, top
differential diagnosis and final two differential diagnoses.
Results: In a total of 80 responses evaluating both initial and final
diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following
multi-agent discussions, the accuracy for the top differential diagnosis
increased to 71.3% (57/80), and for the final two differential diagnoses, to
80.0% (64/80).
Conclusions: The framework demonstrated an ability to re-evaluate and correct
misconceptions, even in scenarios with misleading initial investigations. The
LLM-driven multi-agent conversation framework shows promise in enhancing
diagnostic accuracy in diagnostically challenging medical scenarios. |
---|---|
DOI: | 10.48550/arxiv.2401.14589 |