Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction

Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal de...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of biomedical and health informatics 2024-11, p.1-14
Hauptverfasser:	Vo, Hung Q., Wang, Lin, Wong, Kelvin K., Ezeana, Chika F., Yu, Xiaohui, Yang, Wei, Chang, Jenny, Nguyen, Hien V., Wong, Stephen T.C.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models BI-RADS 3 Biological system modeling Breast Breast cancer Data models Decoding Electronic Health Records (EHRs) Foundation Models Large Language Models Large Vision Models Mammograms Mammography Multimodal Learning Predictive models Tabular Data Training Vision-Language Learning Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	14
container_issue
container_start_page	1
container_title	IEEE journal of biomedical and health informatics
container_volume
creator	Vo, Hung Q. Wang, Lin Wong, Kelvin K. Ezeana, Chika F. Yu, Xiaohui Yang, Wei Chang, Jenny Nguyen, Hien V. Wong, Stephen T.C.
description	Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.
doi_str_mv	10.1109/JBHI.2024.3507638
format	Article
fullrecord	<record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10769012</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10769012</ieee_id><sourcerecordid>10_1109_JBHI_2024_3507638</sourcerecordid><originalsourceid>FETCH-LOGICAL-c632-635fb64ee9559829a99f07e7c2b62b253ebda7c94c9dee52ca4f04f8de3aafb3</originalsourceid><addsrcrecordid>eNpNkN1Kw0AQhRdRsNQ-gODFvkDq_iSb7KUtra1UFBRvw2QzW1fTrOymgl774Ca0gnMzw-GcOfARcsnZlHOmr-9mq_VUMJFOZcZyJYsTMhJcFYkQrDj9u7lOz8kkxjfWT9FLWo3IzzL4b2zpBsIWk2igQfoYsAvgWqzpi4vOt8kG2u0etkjvfY1NpBCQdq9IF9ai6dwn0qXftzV0vRkaOgPzXvkWqfWB3u-bzu18PegBIXZ0Dq3BMNTUzgyRC3JmoYk4Oe4xeVounuerZPNwu57fbBKjpEiUzGylUkSdZboQGrS2LMfciEqJSmQSqxpyo1Oja8RMGEgtS21RowSwlRwTfvhqgo8xoC0_gttB-Co5KweO5cCxHDiWR4595uqQcYj4z58rzbiQvxm1cco</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction</title><source>IEEE Electronic Library (IEL)</source><creator>Vo, Hung Q. ; Wang, Lin ; Wong, Kelvin K. ; Ezeana, Chika F. ; Yu, Xiaohui ; Yang, Wei ; Chang, Jenny ; Nguyen, Hien V. ; Wong, Stephen T.C.</creator><creatorcontrib>Vo, Hung Q. ; Wang, Lin ; Wong, Kelvin K. ; Ezeana, Chika F. ; Yu, Xiaohui ; Yang, Wei ; Chang, Jenny ; Nguyen, Hien V. ; Wong, Stephen T.C.</creatorcontrib><description>Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.</description><identifier>ISSN: 2168-2194</identifier><identifier>EISSN: 2168-2208</identifier><identifier>DOI: 10.1109/JBHI.2024.3507638</identifier><identifier>CODEN: IJBHA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; BI-RADS 3 ; Biological system modeling ; Breast ; Breast cancer ; Data models ; Decoding ; Electronic Health Records (EHRs) ; Foundation Models ; Large Language Models ; Large Vision Models ; Mammograms ; Mammography ; Multimodal Learning ; Predictive models ; Tabular Data ; Training ; Vision-Language Learning ; Visualization</subject><ispartof>IEEE journal of biomedical and health informatics, 2024-11, p.1-14</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10769012$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids></links><search><creatorcontrib>Vo, Hung Q.</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Wong, Kelvin K.</creatorcontrib><creatorcontrib>Ezeana, Chika F.</creatorcontrib><creatorcontrib>Yu, Xiaohui</creatorcontrib><creatorcontrib>Yang, Wei</creatorcontrib><creatorcontrib>Chang, Jenny</creatorcontrib><creatorcontrib>Nguyen, Hien V.</creatorcontrib><creatorcontrib>Wong, Stephen T.C.</creatorcontrib><title>Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction</title><title>IEEE journal of biomedical and health informatics</title><addtitle>JBHI</addtitle><description>Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.</description><subject>Adaptation models</subject><subject>BI-RADS 3</subject><subject>Biological system modeling</subject><subject>Breast</subject><subject>Breast cancer</subject><subject>Data models</subject><subject>Decoding</subject><subject>Electronic Health Records (EHRs)</subject><subject>Foundation Models</subject><subject>Large Language Models</subject><subject>Large Vision Models</subject><subject>Mammograms</subject><subject>Mammography</subject><subject>Multimodal Learning</subject><subject>Predictive models</subject><subject>Tabular Data</subject><subject>Training</subject><subject>Vision-Language Learning</subject><subject>Visualization</subject><issn>2168-2194</issn><issn>2168-2208</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpNkN1Kw0AQhRdRsNQ-gODFvkDq_iSb7KUtra1UFBRvw2QzW1fTrOymgl774Ca0gnMzw-GcOfARcsnZlHOmr-9mq_VUMJFOZcZyJYsTMhJcFYkQrDj9u7lOz8kkxjfWT9FLWo3IzzL4b2zpBsIWk2igQfoYsAvgWqzpi4vOt8kG2u0etkjvfY1NpBCQdq9IF9ai6dwn0qXftzV0vRkaOgPzXvkWqfWB3u-bzu18PegBIXZ0Dq3BMNTUzgyRC3JmoYk4Oe4xeVounuerZPNwu57fbBKjpEiUzGylUkSdZboQGrS2LMfciEqJSmQSqxpyo1Oja8RMGEgtS21RowSwlRwTfvhqgo8xoC0_gttB-Co5KweO5cCxHDiWR4595uqQcYj4z58rzbiQvxm1cco</recordid><startdate>20241126</startdate><enddate>20241126</enddate><creator>Vo, Hung Q.</creator><creator>Wang, Lin</creator><creator>Wong, Kelvin K.</creator><creator>Ezeana, Chika F.</creator><creator>Yu, Xiaohui</creator><creator>Yang, Wei</creator><creator>Chang, Jenny</creator><creator>Nguyen, Hien V.</creator><creator>Wong, Stephen T.C.</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241126</creationdate><title>Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction</title><author>Vo, Hung Q. ; Wang, Lin ; Wong, Kelvin K. ; Ezeana, Chika F. ; Yu, Xiaohui ; Yang, Wei ; Chang, Jenny ; Nguyen, Hien V. ; Wong, Stephen T.C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c632-635fb64ee9559829a99f07e7c2b62b253ebda7c94c9dee52ca4f04f8de3aafb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>BI-RADS 3</topic><topic>Biological system modeling</topic><topic>Breast</topic><topic>Breast cancer</topic><topic>Data models</topic><topic>Decoding</topic><topic>Electronic Health Records (EHRs)</topic><topic>Foundation Models</topic><topic>Large Language Models</topic><topic>Large Vision Models</topic><topic>Mammograms</topic><topic>Mammography</topic><topic>Multimodal Learning</topic><topic>Predictive models</topic><topic>Tabular Data</topic><topic>Training</topic><topic>Vision-Language Learning</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vo, Hung Q.</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Wong, Kelvin K.</creatorcontrib><creatorcontrib>Ezeana, Chika F.</creatorcontrib><creatorcontrib>Yu, Xiaohui</creatorcontrib><creatorcontrib>Yang, Wei</creatorcontrib><creatorcontrib>Chang, Jenny</creatorcontrib><creatorcontrib>Nguyen, Hien V.</creatorcontrib><creatorcontrib>Wong, Stephen T.C.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE journal of biomedical and health informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vo, Hung Q.</au><au>Wang, Lin</au><au>Wong, Kelvin K.</au><au>Ezeana, Chika F.</au><au>Yu, Xiaohui</au><au>Yang, Wei</au><au>Chang, Jenny</au><au>Nguyen, Hien V.</au><au>Wong, Stephen T.C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction</atitle><jtitle>IEEE journal of biomedical and health informatics</jtitle><stitle>JBHI</stitle><date>2024-11-26</date><risdate>2024</risdate><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>2168-2194</issn><eissn>2168-2208</eissn><coden>IJBHA9</coden><abstract>Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.</abstract><pub>IEEE</pub><doi>10.1109/JBHI.2024.3507638</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2168-2194
ispartof	IEEE journal of biomedical and health informatics, 2024-11, p.1-14
issn	2168-2194 2168-2208
language	eng
recordid	cdi_ieee_primary_10769012
source	IEEE Electronic Library (IEL)
subjects	Adaptation models BI-RADS 3 Biological system modeling Breast Breast cancer Data models Decoding Electronic Health Records (EHRs) Foundation Models Large Language Models Large Vision Models Mammograms Mammography Multimodal Learning Predictive models Tabular Data Training Vision-Language Learning Visualization
title	Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T18%3A19%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Frozen%20Large-scale%20Pretrained%20Vision-Language%20Models%20are%20the%20Effective%20Foundational%20Backbone%20for%20Multimodal%20Breast%20Cancer%20Prediction&rft.jtitle=IEEE%20journal%20of%20biomedical%20and%20health%20informatics&rft.au=Vo,%20Hung%20Q.&rft.date=2024-11-26&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=2168-2194&rft.eissn=2168-2208&rft.coden=IJBHA9&rft_id=info:doi/10.1109/JBHI.2024.3507638&rft_dat=%3Ccrossref_ieee_%3E10_1109_JBHI_2024_3507638%3C/crossref_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10769012&rfr_iscdi=true