Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases

Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodolog...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Japanese journal of radiology 2024-12
Hauptverfasser: Sonoda, Yuki, Kurokawa, Ryo, Hagiwara, Akifumi, Asari, Yusuke, Fukushima, Takahiro, Kanzawa, Jun, Gonoi, Wataru, Abe, Osamu
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title Japanese journal of radiology
container_volume
creator Sonoda, Yuki
Kurokawa, Ryo
Hagiwara, Akifumi
Asari, Yusuke
Fukushima, Takahiro
Kanzawa, Jun
Gonoi, Wataru
Abe, Osamu
description Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology-specifically, using a standardized template to first organize clinical information into predefined categories (patient information, history, symptoms, examinations, etc.) before making diagnoses, instead of one-step processing-can enhance the LLM's medical diagnostic capabilities. Three hundred twenty two quiz questions from Radiology's Diagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: (1) Baseline: conventional zero-shot chain-of-thought prompt, (2) two-step approach: structured two-step approach: first, the LLM systematically organizes clinical information into two distinct categories (patient history and imaging findings), then separately analyzes this organized information to provide diagnoses, and (3) Summary-only approach: using only the LLM-generated summary for diagnoses. The two-step approach significantly outperformed the both baseline and summary-only approaches in diagnostic accuracy, as determined by McNemar's test. Primary diagnostic accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p = 0.042) and 56.3% for summary-only (p = 0.035). For the top three diagnoses, accuracy was 70.5, 66.5, and 65.5% respectively (p = 0.005 for baseline, p = 0.008 for summary-only). No significant differences were observed between the baseline and summary-only approaches. Our results indicate that a structured clinical reasoning approach enhances LLM's diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.
doi_str_mv 10.1007/s11604-024-01712-2
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_miscellaneous_3140925143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3140925143</sourcerecordid><originalsourceid>FETCH-LOGICAL-p1002-efae089b6ebac18a89367f198dff4805b211af4c6b632e0c7a1ef748878b8f633</originalsourceid><addsrcrecordid>eNpNkE9LxDAQxYMo7rr6BTxIbnqpZpI0TY-y-A8qHlTwVtJ0smZps92mPeint-IueBhm4P14vDeEnAO7BsaymwigmEwYnwYy4Ak_IHPQKkuA6Y_Df_eMnMS4ZkxJIeUxmYlc8TTN5ZysX4d-tMPYY01t44O3pqE9mrgJPqxo12_abqAYPk2wGGlRPF9GWnuzCps4eEut6UzlGz_4SfVhL_lIu2ZyQbod_fdERYyn5MiZJuLZbi_I-_3d2_IxKV4enpa3RdJNrXiCziDTeaWwMha00blQmYNc185JzdKKAxgnraqU4MhsZgBdJrXOdKWdEmJBrv58p_DbEeNQtj5abBoTcDPGUoBkOU9B_qIXO3SsWqzLrvet6b_K_X_ED2BlaeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3140925143</pqid></control><display><type>article</type><title>Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases</title><source>SpringerNature Journals</source><creator>Sonoda, Yuki ; Kurokawa, Ryo ; Hagiwara, Akifumi ; Asari, Yusuke ; Fukushima, Takahiro ; Kanzawa, Jun ; Gonoi, Wataru ; Abe, Osamu</creator><creatorcontrib>Sonoda, Yuki ; Kurokawa, Ryo ; Hagiwara, Akifumi ; Asari, Yusuke ; Fukushima, Takahiro ; Kanzawa, Jun ; Gonoi, Wataru ; Abe, Osamu</creatorcontrib><description>Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology-specifically, using a standardized template to first organize clinical information into predefined categories (patient information, history, symptoms, examinations, etc.) before making diagnoses, instead of one-step processing-can enhance the LLM's medical diagnostic capabilities. Three hundred twenty two quiz questions from Radiology's Diagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: (1) Baseline: conventional zero-shot chain-of-thought prompt, (2) two-step approach: structured two-step approach: first, the LLM systematically organizes clinical information into two distinct categories (patient history and imaging findings), then separately analyzes this organized information to provide diagnoses, and (3) Summary-only approach: using only the LLM-generated summary for diagnoses. The two-step approach significantly outperformed the both baseline and summary-only approaches in diagnostic accuracy, as determined by McNemar's test. Primary diagnostic accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p = 0.042) and 56.3% for summary-only (p = 0.035). For the top three diagnoses, accuracy was 70.5, 66.5, and 65.5% respectively (p = 0.005 for baseline, p = 0.008 for summary-only). No significant differences were observed between the baseline and summary-only approaches. Our results indicate that a structured clinical reasoning approach enhances LLM's diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.</description><identifier>ISSN: 1867-108X</identifier><identifier>EISSN: 1867-108X</identifier><identifier>DOI: 10.1007/s11604-024-01712-2</identifier><identifier>PMID: 39625594</identifier><language>eng</language><publisher>Japan</publisher><ispartof>Japanese journal of radiology, 2024-12</ispartof><rights>2024. The Author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5018-4683</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,27929,27930</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39625594$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sonoda, Yuki</creatorcontrib><creatorcontrib>Kurokawa, Ryo</creatorcontrib><creatorcontrib>Hagiwara, Akifumi</creatorcontrib><creatorcontrib>Asari, Yusuke</creatorcontrib><creatorcontrib>Fukushima, Takahiro</creatorcontrib><creatorcontrib>Kanzawa, Jun</creatorcontrib><creatorcontrib>Gonoi, Wataru</creatorcontrib><creatorcontrib>Abe, Osamu</creatorcontrib><title>Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases</title><title>Japanese journal of radiology</title><addtitle>Jpn J Radiol</addtitle><description>Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology-specifically, using a standardized template to first organize clinical information into predefined categories (patient information, history, symptoms, examinations, etc.) before making diagnoses, instead of one-step processing-can enhance the LLM's medical diagnostic capabilities. Three hundred twenty two quiz questions from Radiology's Diagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: (1) Baseline: conventional zero-shot chain-of-thought prompt, (2) two-step approach: structured two-step approach: first, the LLM systematically organizes clinical information into two distinct categories (patient history and imaging findings), then separately analyzes this organized information to provide diagnoses, and (3) Summary-only approach: using only the LLM-generated summary for diagnoses. The two-step approach significantly outperformed the both baseline and summary-only approaches in diagnostic accuracy, as determined by McNemar's test. Primary diagnostic accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p = 0.042) and 56.3% for summary-only (p = 0.035). For the top three diagnoses, accuracy was 70.5, 66.5, and 65.5% respectively (p = 0.005 for baseline, p = 0.008 for summary-only). No significant differences were observed between the baseline and summary-only approaches. Our results indicate that a structured clinical reasoning approach enhances LLM's diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.</description><issn>1867-108X</issn><issn>1867-108X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE9LxDAQxYMo7rr6BTxIbnqpZpI0TY-y-A8qHlTwVtJ0smZps92mPeint-IueBhm4P14vDeEnAO7BsaymwigmEwYnwYy4Ak_IHPQKkuA6Y_Df_eMnMS4ZkxJIeUxmYlc8TTN5ZysX4d-tMPYY01t44O3pqE9mrgJPqxo12_abqAYPk2wGGlRPF9GWnuzCps4eEut6UzlGz_4SfVhL_lIu2ZyQbod_fdERYyn5MiZJuLZbi_I-_3d2_IxKV4enpa3RdJNrXiCziDTeaWwMha00blQmYNc185JzdKKAxgnraqU4MhsZgBdJrXOdKWdEmJBrv58p_DbEeNQtj5abBoTcDPGUoBkOU9B_qIXO3SsWqzLrvet6b_K_X_ED2BlaeA</recordid><startdate>20241203</startdate><enddate>20241203</enddate><creator>Sonoda, Yuki</creator><creator>Kurokawa, Ryo</creator><creator>Hagiwara, Akifumi</creator><creator>Asari, Yusuke</creator><creator>Fukushima, Takahiro</creator><creator>Kanzawa, Jun</creator><creator>Gonoi, Wataru</creator><creator>Abe, Osamu</creator><scope>NPM</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5018-4683</orcidid></search><sort><creationdate>20241203</creationdate><title>Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases</title><author>Sonoda, Yuki ; Kurokawa, Ryo ; Hagiwara, Akifumi ; Asari, Yusuke ; Fukushima, Takahiro ; Kanzawa, Jun ; Gonoi, Wataru ; Abe, Osamu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p1002-efae089b6ebac18a89367f198dff4805b211af4c6b632e0c7a1ef748878b8f633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sonoda, Yuki</creatorcontrib><creatorcontrib>Kurokawa, Ryo</creatorcontrib><creatorcontrib>Hagiwara, Akifumi</creatorcontrib><creatorcontrib>Asari, Yusuke</creatorcontrib><creatorcontrib>Fukushima, Takahiro</creatorcontrib><creatorcontrib>Kanzawa, Jun</creatorcontrib><creatorcontrib>Gonoi, Wataru</creatorcontrib><creatorcontrib>Abe, Osamu</creatorcontrib><collection>PubMed</collection><collection>MEDLINE - Academic</collection><jtitle>Japanese journal of radiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sonoda, Yuki</au><au>Kurokawa, Ryo</au><au>Hagiwara, Akifumi</au><au>Asari, Yusuke</au><au>Fukushima, Takahiro</au><au>Kanzawa, Jun</au><au>Gonoi, Wataru</au><au>Abe, Osamu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases</atitle><jtitle>Japanese journal of radiology</jtitle><addtitle>Jpn J Radiol</addtitle><date>2024-12-03</date><risdate>2024</risdate><issn>1867-108X</issn><eissn>1867-108X</eissn><abstract>Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities. This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology-specifically, using a standardized template to first organize clinical information into predefined categories (patient information, history, symptoms, examinations, etc.) before making diagnoses, instead of one-step processing-can enhance the LLM's medical diagnostic capabilities. Three hundred twenty two quiz questions from Radiology's Diagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: (1) Baseline: conventional zero-shot chain-of-thought prompt, (2) two-step approach: structured two-step approach: first, the LLM systematically organizes clinical information into two distinct categories (patient history and imaging findings), then separately analyzes this organized information to provide diagnoses, and (3) Summary-only approach: using only the LLM-generated summary for diagnoses. The two-step approach significantly outperformed the both baseline and summary-only approaches in diagnostic accuracy, as determined by McNemar's test. Primary diagnostic accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p = 0.042) and 56.3% for summary-only (p = 0.035). For the top three diagnoses, accuracy was 70.5, 66.5, and 65.5% respectively (p = 0.005 for baseline, p = 0.008 for summary-only). No significant differences were observed between the baseline and summary-only approaches. Our results indicate that a structured clinical reasoning approach enhances LLM's diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.</abstract><cop>Japan</cop><pmid>39625594</pmid><doi>10.1007/s11604-024-01712-2</doi><orcidid>https://orcid.org/0000-0002-5018-4683</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1867-108X
ispartof Japanese journal of radiology, 2024-12
issn 1867-108X
1867-108X
language eng
recordid cdi_proquest_miscellaneous_3140925143
source SpringerNature Journals
title Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T16%3A25%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20clinical%20reasoning%20prompt%20enhances%20LLM's%20diagnostic%20capabilities%20in%20diagnosis%20please%20quiz%20cases&rft.jtitle=Japanese%20journal%20of%20radiology&rft.au=Sonoda,%20Yuki&rft.date=2024-12-03&rft.issn=1867-108X&rft.eissn=1867-108X&rft_id=info:doi/10.1007/s11604-024-01712-2&rft_dat=%3Cproquest_pubme%3E3140925143%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3140925143&rft_id=info:pmid/39625594&rfr_iscdi=true