Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constella...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We develop Polaris, the first safety-focused LLM constellation for real-time
patient-AI healthcare conversations. Unlike prior LLM works in healthcare
focusing on tasks like question answering, our work specifically focuses on
long multi-turn voice conversations. Our one-trillion parameter constellation
system is composed of several multibillion parameter LLMs as co-operative
agents: a stateful primary agent that focuses on driving an engaging
conversation and several specialist support agents focused on healthcare tasks
performed by nurses to increase safety and reduce hallucinations. We develop a
sophisticated training protocol for iterative co-training of the agents that
optimize for diverse objectives. We train our models on proprietary data,
clinical care plans, healthcare regulatory documents, medical manuals, and
other medical reasoning documents. We align our models to speak like medical
professionals, using organic healthcare conversations and simulated ones
between patient actors and experienced nurses. This allows our system to
express unique capabilities such as rapport building, trust building, empathy
and bedside manner. Finally, we present the first comprehensive clinician
evaluation of an LLM system for healthcare. We recruited over 1100 U.S.
licensed nurses and over 130 U.S. licensed physicians to perform end-to-end
conversational evaluations of our system by posing as patients and rating the
system on several measures. We demonstrate Polaris performs on par with human
nurses on aggregate across dimensions such as medical safety, clinical
readiness, conversational quality, and bedside manner. Additionally, we conduct
a challenging task-based evaluation of the individual specialist support
agents, where we demonstrate our LLM agents significantly outperform a much
larger general-purpose LLM (GPT-4) as well as from its own medium-size class
(LLaMA-2 70B). |
---|---|
DOI: | 10.48550/arxiv.2403.13313 |