Ambient AI Scribing Support: Comparing the Performance of Specialized AI Agentic Architecture to Leading Foundational Models
This study compares Sporo Health's AI Scribe, a proprietary model fine-tuned for medical scribing, with various LLMs (GPT-4o, GPT-3.5, Gemma-9B, and Llama-3.2-3B) in clinical documentation. We analyzed de-identified patient transcripts from partner clinics, using clinician-provided SOAP notes a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This study compares Sporo Health's AI Scribe, a proprietary model fine-tuned
for medical scribing, with various LLMs (GPT-4o, GPT-3.5, Gemma-9B, and
Llama-3.2-3B) in clinical documentation. We analyzed de-identified patient
transcripts from partner clinics, using clinician-provided SOAP notes as the
ground truth. Each model generated SOAP summaries using zero-shot prompting,
with performance assessed via recall, precision, and F1 scores. Sporo
outperformed all models, achieving the highest recall (73.3%), precision
(78.6%), and F1 score (75.3%) with the lowest performance variance.
Statistically significant differences (p < 0.05) were found between Sporo and
the other models, with post-hoc tests showing significant improvements over
GPT-3.5, Gemma-9B, and Llama 3.2-3B. While Sporo outperformed GPT-4o by up to
10%, the difference was not statistically significant (p = 0.25). Clinical user
satisfaction, measured with a modified PDQI-9 inventory, favored Sporo.
Evaluations indicated Sporo's outputs were more accurate and relevant. This
highlights the potential of Sporo's multi-agentic architecture to improve
clinical workflows. |
---|---|
DOI: | 10.48550/arxiv.2411.06713 |