Towards Accurate Differential Diagnosis with Large Language Models

An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new oppo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-11
Hauptverfasser:	McDuff, Daniel, Schaekermann, Mike, Tu, Tao, Palepu, Anil, Wang, Amy, Garrison, Jake, Singhal, Karan, Sharma, Yash, Azizi, Shekoofeh, Kulkarni, Kavita, Hou, Le, Cheng, Yong, Liu, Yun, Mahdavi, S Sara, Prakash, Sushant, Pathak, Anupam, Semturs, Christopher, Patel, Shwetak, Webster, Dale R, Dominowska, Ewa, Gottweis, Juraj, Barral, Joelle, Chou, Katherine, Corrado, Greg S, Matias, Yossi, Sunshine, Jake, Karthikesalingam, Alan, Natarajan, Vivek
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Diagnosis Health services Iterative methods Large language models Reasoning Search engines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	McDuff, Daniel Schaekermann, Mike Tu, Tao Palepu, Anil Wang, Amy Garrison, Jake Singhal, Karan Sharma, Yash Azizi, Shekoofeh Kulkarni, Kavita Hou, Le Cheng, Yong Liu, Yun Mahdavi, S Sara Prakash, Sushant Pathak, Anupam Semturs, Christopher Patel, Shwetak Webster, Dale R Dominowska, Ewa Gottweis, Juraj Barral, Joelle Chou, Katherine Corrado, Greg S Matias, Yossi Sunshine, Jake Karthikesalingam, Alan Natarajan, Vivek
description	An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2897285754</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2897285754</sourcerecordid><originalsourceid>FETCH-proquest_journals_28972857543</originalsourceid><addsrcrecordid>eNqNissKwjAURIMgWLT_EHBdqElj69InLnTXfbm0tzWlJJqb0N83Cz_AzcwZzixYIqTcZVUhxIqlRGOe52JfCqVkwk61ncF1xI9tGxx45Bfd9-jQeA1THDAYS5r4rP2LP8ANGNMMASI8bYcTbdiyh4kw_fWabW_X-nzP3s5-ApJvRhuciaoR1aEUlSpVIf97fQGuvDnC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2897285754</pqid></control><display><type>article</type><title>Towards Accurate Differential Diagnosis with Large Language Models</title><source>Free E- Journals</source><creator>McDuff, Daniel ; Schaekermann, Mike ; Tu, Tao ; Palepu, Anil ; Wang, Amy ; Garrison, Jake ; Singhal, Karan ; Sharma, Yash ; Azizi, Shekoofeh ; Kulkarni, Kavita ; Hou, Le ; Cheng, Yong ; Liu, Yun ; Mahdavi, S Sara ; Prakash, Sushant ; Pathak, Anupam ; Semturs, Christopher ; Patel, Shwetak ; Webster, Dale R ; Dominowska, Ewa ; Gottweis, Juraj ; Barral, Joelle ; Chou, Katherine ; Corrado, Greg S ; Matias, Yossi ; Sunshine, Jake ; Karthikesalingam, Alan ; Natarajan, Vivek</creator><creatorcontrib>McDuff, Daniel ; Schaekermann, Mike ; Tu, Tao ; Palepu, Anil ; Wang, Amy ; Garrison, Jake ; Singhal, Karan ; Sharma, Yash ; Azizi, Shekoofeh ; Kulkarni, Kavita ; Hou, Le ; Cheng, Yong ; Liu, Yun ; Mahdavi, S Sara ; Prakash, Sushant ; Pathak, Anupam ; Semturs, Christopher ; Patel, Shwetak ; Webster, Dale R ; Dominowska, Ewa ; Gottweis, Juraj ; Barral, Joelle ; Chou, Katherine ; Corrado, Greg S ; Matias, Yossi ; Sunshine, Jake ; Karthikesalingam, Alan ; Natarajan, Vivek</creatorcontrib><description>An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accuracy ; Diagnosis ; Health services ; Iterative methods ; Large language models ; Reasoning ; Search engines</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>McDuff, Daniel</creatorcontrib><creatorcontrib>Schaekermann, Mike</creatorcontrib><creatorcontrib>Tu, Tao</creatorcontrib><creatorcontrib>Palepu, Anil</creatorcontrib><creatorcontrib>Wang, Amy</creatorcontrib><creatorcontrib>Garrison, Jake</creatorcontrib><creatorcontrib>Singhal, Karan</creatorcontrib><creatorcontrib>Sharma, Yash</creatorcontrib><creatorcontrib>Azizi, Shekoofeh</creatorcontrib><creatorcontrib>Kulkarni, Kavita</creatorcontrib><creatorcontrib>Hou, Le</creatorcontrib><creatorcontrib>Cheng, Yong</creatorcontrib><creatorcontrib>Liu, Yun</creatorcontrib><creatorcontrib>Mahdavi, S Sara</creatorcontrib><creatorcontrib>Prakash, Sushant</creatorcontrib><creatorcontrib>Pathak, Anupam</creatorcontrib><creatorcontrib>Semturs, Christopher</creatorcontrib><creatorcontrib>Patel, Shwetak</creatorcontrib><creatorcontrib>Webster, Dale R</creatorcontrib><creatorcontrib>Dominowska, Ewa</creatorcontrib><creatorcontrib>Gottweis, Juraj</creatorcontrib><creatorcontrib>Barral, Joelle</creatorcontrib><creatorcontrib>Chou, Katherine</creatorcontrib><creatorcontrib>Corrado, Greg S</creatorcontrib><creatorcontrib>Matias, Yossi</creatorcontrib><creatorcontrib>Sunshine, Jake</creatorcontrib><creatorcontrib>Karthikesalingam, Alan</creatorcontrib><creatorcontrib>Natarajan, Vivek</creatorcontrib><title>Towards Accurate Differential Diagnosis with Large Language Models</title><title>arXiv.org</title><description>An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.</description><subject>Accuracy</subject><subject>Diagnosis</subject><subject>Health services</subject><subject>Iterative methods</subject><subject>Large language models</subject><subject>Reasoning</subject><subject>Search engines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNissKwjAURIMgWLT_EHBdqElj69InLnTXfbm0tzWlJJqb0N83Cz_AzcwZzixYIqTcZVUhxIqlRGOe52JfCqVkwk61ncF1xI9tGxx45Bfd9-jQeA1THDAYS5r4rP2LP8ANGNMMASI8bYcTbdiyh4kw_fWabW_X-nzP3s5-ApJvRhuciaoR1aEUlSpVIf97fQGuvDnC</recordid><startdate>20231130</startdate><enddate>20231130</enddate><creator>McDuff, Daniel</creator><creator>Schaekermann, Mike</creator><creator>Tu, Tao</creator><creator>Palepu, Anil</creator><creator>Wang, Amy</creator><creator>Garrison, Jake</creator><creator>Singhal, Karan</creator><creator>Sharma, Yash</creator><creator>Azizi, Shekoofeh</creator><creator>Kulkarni, Kavita</creator><creator>Hou, Le</creator><creator>Cheng, Yong</creator><creator>Liu, Yun</creator><creator>Mahdavi, S Sara</creator><creator>Prakash, Sushant</creator><creator>Pathak, Anupam</creator><creator>Semturs, Christopher</creator><creator>Patel, Shwetak</creator><creator>Webster, Dale R</creator><creator>Dominowska, Ewa</creator><creator>Gottweis, Juraj</creator><creator>Barral, Joelle</creator><creator>Chou, Katherine</creator><creator>Corrado, Greg S</creator><creator>Matias, Yossi</creator><creator>Sunshine, Jake</creator><creator>Karthikesalingam, Alan</creator><creator>Natarajan, Vivek</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231130</creationdate><title>Towards Accurate Differential Diagnosis with Large Language Models</title><author>McDuff, Daniel ; Schaekermann, Mike ; Tu, Tao ; Palepu, Anil ; Wang, Amy ; Garrison, Jake ; Singhal, Karan ; Sharma, Yash ; Azizi, Shekoofeh ; Kulkarni, Kavita ; Hou, Le ; Cheng, Yong ; Liu, Yun ; Mahdavi, S Sara ; Prakash, Sushant ; Pathak, Anupam ; Semturs, Christopher ; Patel, Shwetak ; Webster, Dale R ; Dominowska, Ewa ; Gottweis, Juraj ; Barral, Joelle ; Chou, Katherine ; Corrado, Greg S ; Matias, Yossi ; Sunshine, Jake ; Karthikesalingam, Alan ; Natarajan, Vivek</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28972857543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Diagnosis</topic><topic>Health services</topic><topic>Iterative methods</topic><topic>Large language models</topic><topic>Reasoning</topic><topic>Search engines</topic><toplevel>online_resources</toplevel><creatorcontrib>McDuff, Daniel</creatorcontrib><creatorcontrib>Schaekermann, Mike</creatorcontrib><creatorcontrib>Tu, Tao</creatorcontrib><creatorcontrib>Palepu, Anil</creatorcontrib><creatorcontrib>Wang, Amy</creatorcontrib><creatorcontrib>Garrison, Jake</creatorcontrib><creatorcontrib>Singhal, Karan</creatorcontrib><creatorcontrib>Sharma, Yash</creatorcontrib><creatorcontrib>Azizi, Shekoofeh</creatorcontrib><creatorcontrib>Kulkarni, Kavita</creatorcontrib><creatorcontrib>Hou, Le</creatorcontrib><creatorcontrib>Cheng, Yong</creatorcontrib><creatorcontrib>Liu, Yun</creatorcontrib><creatorcontrib>Mahdavi, S Sara</creatorcontrib><creatorcontrib>Prakash, Sushant</creatorcontrib><creatorcontrib>Pathak, Anupam</creatorcontrib><creatorcontrib>Semturs, Christopher</creatorcontrib><creatorcontrib>Patel, Shwetak</creatorcontrib><creatorcontrib>Webster, Dale R</creatorcontrib><creatorcontrib>Dominowska, Ewa</creatorcontrib><creatorcontrib>Gottweis, Juraj</creatorcontrib><creatorcontrib>Barral, Joelle</creatorcontrib><creatorcontrib>Chou, Katherine</creatorcontrib><creatorcontrib>Corrado, Greg S</creatorcontrib><creatorcontrib>Matias, Yossi</creatorcontrib><creatorcontrib>Sunshine, Jake</creatorcontrib><creatorcontrib>Karthikesalingam, Alan</creatorcontrib><creatorcontrib>Natarajan, Vivek</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>McDuff, Daniel</au><au>Schaekermann, Mike</au><au>Tu, Tao</au><au>Palepu, Anil</au><au>Wang, Amy</au><au>Garrison, Jake</au><au>Singhal, Karan</au><au>Sharma, Yash</au><au>Azizi, Shekoofeh</au><au>Kulkarni, Kavita</au><au>Hou, Le</au><au>Cheng, Yong</au><au>Liu, Yun</au><au>Mahdavi, S Sara</au><au>Prakash, Sushant</au><au>Pathak, Anupam</au><au>Semturs, Christopher</au><au>Patel, Shwetak</au><au>Webster, Dale R</au><au>Dominowska, Ewa</au><au>Gottweis, Juraj</au><au>Barral, Joelle</au><au>Chou, Katherine</au><au>Corrado, Greg S</au><au>Matias, Yossi</au><au>Sunshine, Jake</au><au>Karthikesalingam, Alan</au><au>Natarajan, Vivek</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Towards Accurate Differential Diagnosis with Large Language Models</atitle><jtitle>arXiv.org</jtitle><date>2023-11-30</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2897285754
source	Free E- Journals
subjects	Accuracy Diagnosis Health services Iterative methods Large language models Reasoning Search engines
title	Towards Accurate Differential Diagnosis with Large Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A58%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Towards%20Accurate%20Differential%20Diagnosis%20with%20Large%20Language%20Models&rft.jtitle=arXiv.org&rft.au=McDuff,%20Daniel&rft.date=2023-11-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2897285754%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2897285754&rft_id=info:pmid/&rfr_iscdi=true