Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports

We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. This study included 205 patients, each with two consecutive radiological reports. We des...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Academic radiology 2024-12
Hauptverfasser: Han, Na Yeon, Shin, Keewon, Kim, Min Ju, Park, Beom Jin, Sim, Ki Choon, Han, Yeo Eun, Sung, Deuk Jae, Choi, Jae Woong, Yeom, Suk Keu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title Academic radiology
container_volume
creator Han, Na Yeon
Shin, Keewon
Kim, Min Ju
Park, Beom Jin
Sim, Ki Choon
Han, Yeo Eun
Sung, Deuk Jae
Choi, Jae Woong
Yeom, Suk Keu
description We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories. In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P 
doi_str_mv 10.1016/j.acra.2024.10.050
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_miscellaneous_3146624085</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1076633224008377</els_id><sourcerecordid>3146624085</sourcerecordid><originalsourceid>FETCH-LOGICAL-e1530-7dffa38ad42c1cf9ba4f73e804fd5c2925c5d0fa26b1d22e73f8530b495dfcdc3</originalsourceid><addsrcrecordid>eNpdUc2O0zAYtBCIXQovwAH5yCVd_-WniEtUdctKRYtoOVuO_SV1ldjFTir18fbNcLTLZS_2aDzz-dMMQp8pWVJCi7vTUumglowwkYglyckbdEurssoEEcXbhElZZAXn7AZ9iPFECM2Lir9HN3xV5JUoxS162rijctq6Dj867XvfWa16vJ_CBWzfpyfAh2PwU3fEOxU6SKfrJpXAT2-gz-oYbRzB4Nqp_prwN1zjtR_OKqjRXgDvx8lcsW_x9tchE1g5g7cwWGexdXhzUf2UdK--f4hxgojvgx_wHoJNVN0Yn1wJrQ94r5XDv-Hswxg_onet6iN8erkX6M_95rD-ke0etw_repcBzTnJStO2ilfKCKapbleNEm3JoSKiNblmK5br3JBWsaKhhjEoeVslXyNWuWm10XyBvj7PPQf_N203ysFGDXNG4KcoORVFwQRJrgX68iKdmgGMPAc7qHCV_2NPgu_PAkgLXywEGbWFlLWxAfQojbeSEjm3LE9yblnOLc9capn_A-70nQE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3146624085</pqid></control><display><type>article</type><title>Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Han, Na Yeon ; Shin, Keewon ; Kim, Min Ju ; Park, Beom Jin ; Sim, Ki Choon ; Han, Yeo Eun ; Sung, Deuk Jae ; Choi, Jae Woong ; Yeom, Suk Keu</creator><creatorcontrib>Han, Na Yeon ; Shin, Keewon ; Kim, Min Ju ; Park, Beom Jin ; Sim, Ki Choon ; Han, Yeo Eun ; Sung, Deuk Jae ; Choi, Jae Woong ; Yeom, Suk Keu</creatorcontrib><description>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories. In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P &lt; 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P &lt; 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P &lt; 0.001). This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</description><identifier>ISSN: 1076-6332</identifier><identifier>ISSN: 1878-4046</identifier><identifier>EISSN: 1878-4046</identifier><identifier>DOI: 10.1016/j.acra.2024.10.050</identifier><identifier>PMID: 39658474</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Artificial Intelligence ; Large Language model ; Multidetector Computed Tomography ; Oncology ; Radiology Report</subject><ispartof>Academic radiology, 2024-12</ispartof><rights>2024 The Association of University Radiologists</rights><rights>Copyright © 2024 The Association of University Radiologists. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-0979-9835</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39658474$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Na Yeon</creatorcontrib><creatorcontrib>Shin, Keewon</creatorcontrib><creatorcontrib>Kim, Min Ju</creatorcontrib><creatorcontrib>Park, Beom Jin</creatorcontrib><creatorcontrib>Sim, Ki Choon</creatorcontrib><creatorcontrib>Han, Yeo Eun</creatorcontrib><creatorcontrib>Sung, Deuk Jae</creatorcontrib><creatorcontrib>Choi, Jae Woong</creatorcontrib><creatorcontrib>Yeom, Suk Keu</creatorcontrib><title>Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports</title><title>Academic radiology</title><addtitle>Acad Radiol</addtitle><description>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories. In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P &lt; 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P &lt; 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P &lt; 0.001). This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</description><subject>Artificial Intelligence</subject><subject>Large Language model</subject><subject>Multidetector Computed Tomography</subject><subject>Oncology</subject><subject>Radiology Report</subject><issn>1076-6332</issn><issn>1878-4046</issn><issn>1878-4046</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpdUc2O0zAYtBCIXQovwAH5yCVd_-WniEtUdctKRYtoOVuO_SV1ldjFTir18fbNcLTLZS_2aDzz-dMMQp8pWVJCi7vTUumglowwkYglyckbdEurssoEEcXbhElZZAXn7AZ9iPFECM2Lir9HN3xV5JUoxS162rijctq6Dj867XvfWa16vJ_CBWzfpyfAh2PwU3fEOxU6SKfrJpXAT2-gz-oYbRzB4Nqp_prwN1zjtR_OKqjRXgDvx8lcsW_x9tchE1g5g7cwWGexdXhzUf2UdK--f4hxgojvgx_wHoJNVN0Yn1wJrQ94r5XDv-Hswxg_onet6iN8erkX6M_95rD-ke0etw_repcBzTnJStO2ilfKCKapbleNEm3JoSKiNblmK5br3JBWsaKhhjEoeVslXyNWuWm10XyBvj7PPQf_N203ysFGDXNG4KcoORVFwQRJrgX68iKdmgGMPAc7qHCV_2NPgu_PAkgLXywEGbWFlLWxAfQojbeSEjm3LE9yblnOLc9capn_A-70nQE</recordid><startdate>20241209</startdate><enddate>20241209</enddate><creator>Han, Na Yeon</creator><creator>Shin, Keewon</creator><creator>Kim, Min Ju</creator><creator>Park, Beom Jin</creator><creator>Sim, Ki Choon</creator><creator>Han, Yeo Eun</creator><creator>Sung, Deuk Jae</creator><creator>Choi, Jae Woong</creator><creator>Yeom, Suk Keu</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>NPM</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0979-9835</orcidid></search><sort><creationdate>20241209</creationdate><title>Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports</title><author>Han, Na Yeon ; Shin, Keewon ; Kim, Min Ju ; Park, Beom Jin ; Sim, Ki Choon ; Han, Yeo Eun ; Sung, Deuk Jae ; Choi, Jae Woong ; Yeom, Suk Keu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e1530-7dffa38ad42c1cf9ba4f73e804fd5c2925c5d0fa26b1d22e73f8530b495dfcdc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Large Language model</topic><topic>Multidetector Computed Tomography</topic><topic>Oncology</topic><topic>Radiology Report</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Na Yeon</creatorcontrib><creatorcontrib>Shin, Keewon</creatorcontrib><creatorcontrib>Kim, Min Ju</creatorcontrib><creatorcontrib>Park, Beom Jin</creatorcontrib><creatorcontrib>Sim, Ki Choon</creatorcontrib><creatorcontrib>Han, Yeo Eun</creatorcontrib><creatorcontrib>Sung, Deuk Jae</creatorcontrib><creatorcontrib>Choi, Jae Woong</creatorcontrib><creatorcontrib>Yeom, Suk Keu</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>PubMed</collection><collection>MEDLINE - Academic</collection><jtitle>Academic radiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Na Yeon</au><au>Shin, Keewon</au><au>Kim, Min Ju</au><au>Park, Beom Jin</au><au>Sim, Ki Choon</au><au>Han, Yeo Eun</au><au>Sung, Deuk Jae</au><au>Choi, Jae Woong</au><au>Yeom, Suk Keu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports</atitle><jtitle>Academic radiology</jtitle><addtitle>Acad Radiol</addtitle><date>2024-12-09</date><risdate>2024</risdate><issn>1076-6332</issn><issn>1878-4046</issn><eissn>1878-4046</eissn><abstract>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention. This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories. In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P &lt; 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P &lt; 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P &lt; 0.001). This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>39658474</pmid><doi>10.1016/j.acra.2024.10.050</doi><orcidid>https://orcid.org/0000-0003-0979-9835</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1076-6332
ispartof Academic radiology, 2024-12
issn 1076-6332
1878-4046
1878-4046
language eng
recordid cdi_proquest_miscellaneous_3146624085
source ScienceDirect Journals (5 years ago - present)
subjects Artificial Intelligence
Large Language model
Multidetector Computed Tomography
Oncology
Radiology Report
title Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T16%3A49%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Oncological%20Surveillance%20Through%20Large%20Language%20Model-Assisted%20Analysis:%20A%20Comparative%20Study%20of%20GPT-4%20and%20Gemini%20in%20Evaluating%20Oncological%20Issues%20From%20Serial%20Abdominal%20CT%20Scan%20Reports&rft.jtitle=Academic%20radiology&rft.au=Han,%20Na%20Yeon&rft.date=2024-12-09&rft.issn=1076-6332&rft.eissn=1878-4046&rft_id=info:doi/10.1016/j.acra.2024.10.050&rft_dat=%3Cproquest_pubme%3E3146624085%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3146624085&rft_id=info:pmid/39658474&rft_els_id=S1076633224008377&rfr_iscdi=true