Medical Image Description Based on Multimodal Auxiliary Signals and Transformer

Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of intelligent systems 2024-02, Vol.2024, p.1-12
Hauptverfasser: Tan, Yun, Li, Chunzhi, Qin, Jiaohua, Xue, Youyuan, Xiang, Xuyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue
container_start_page 1
container_title International journal of intelligent systems
container_volume 2024
creator Tan, Yun
Li, Chunzhi
Qin, Jiaohua
Xue, Youyuan
Xiang, Xuyu
description Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.
doi_str_mv 10.1155/2024/6680546
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2931377591</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2931377591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKs3f0DAo67NJNlNcqz1q9DSgxW8LdlNtqbsR012Uf-9Ke3Z0wwvzwwvD0LXQO4B0nRCCeWTLJMk5dkJGgFRMgGAj1M0IlLyRIJg5-gihC0hAIKnI7RaWuNKXeN5ozcWP9pQerfrXdfiBx2swXFZDnXvms5Eajr8uNpp_4vf3KbVdcC6NXjtdRuqzjfWX6KzKsb26jjH6P35aT17TRarl_lsukhKqnifGFlBxgyURUWNiRGTFErCDKfSQioKLpjMCiEUoVaDUlQYEg8MV5nRULAxujn83fnua7Chz7fd4PeNcqoYMCFSBZG6O1Cl70Lwtsp33jWxfg4k3yvL98ryo7KI3x7wT9ca_e3-p_8A3zJqCg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2931377591</pqid></control><display><type>article</type><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><source>ProQuest Central Essentials</source><source>ProQuest Central (Alumni Edition)</source><source>ProQuest Central Student</source><source>ProQuest Central Korea</source><source>ProQuest Central UK/Ireland</source><source>Wiley Online Library (Open Access Collection)</source><source>Alma/SFX Local Collection</source><source>ProQuest Central</source><creator>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu</creator><contributor>Gianni, Costa ; Costa Gianni</contributor><creatorcontrib>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu ; Gianni, Costa ; Costa Gianni</creatorcontrib><description>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</description><identifier>ISSN: 0884-8173</identifier><identifier>EISSN: 1098-111X</identifier><identifier>DOI: 10.1155/2024/6680546</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Audio data ; Bias ; Business metrics ; Datasets ; Deep learning ; Medical imaging ; Medical research ; Neural networks ; Radiology ; Researchers ; Transformers</subject><ispartof>International journal of intelligent systems, 2024-02, Vol.2024, p.1-12</ispartof><rights>Copyright © 2024 Yun Tan et al.</rights><rights>Copyright © 2024 Yun Tan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</cites><orcidid>0000-0002-9855-8234 ; 0000-0003-0695-2283 ; 0000-0002-7549-7731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2931377591/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2931377591?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,21388,21389,21390,21391,23256,27924,27925,33530,33703,33744,34005,34314,43659,43787,43805,43953,44067,64385,64389,72469,74104,74283,74302,74473,74590</link.rule.ids></links><search><contributor>Gianni, Costa</contributor><contributor>Costa Gianni</contributor><creatorcontrib>Tan, Yun</creatorcontrib><creatorcontrib>Li, Chunzhi</creatorcontrib><creatorcontrib>Qin, Jiaohua</creatorcontrib><creatorcontrib>Xue, Youyuan</creatorcontrib><creatorcontrib>Xiang, Xuyu</creatorcontrib><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><title>International journal of intelligent systems</title><description>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</description><subject>Audio data</subject><subject>Bias</subject><subject>Business metrics</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Medical imaging</subject><subject>Medical research</subject><subject>Neural networks</subject><subject>Radiology</subject><subject>Researchers</subject><subject>Transformers</subject><issn>0884-8173</issn><issn>1098-111X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE1LAzEQhoMoWKs3f0DAo67NJNlNcqz1q9DSgxW8LdlNtqbsR012Uf-9Ke3Z0wwvzwwvD0LXQO4B0nRCCeWTLJMk5dkJGgFRMgGAj1M0IlLyRIJg5-gihC0hAIKnI7RaWuNKXeN5ozcWP9pQerfrXdfiBx2swXFZDnXvms5Eajr8uNpp_4vf3KbVdcC6NXjtdRuqzjfWX6KzKsb26jjH6P35aT17TRarl_lsukhKqnifGFlBxgyURUWNiRGTFErCDKfSQioKLpjMCiEUoVaDUlQYEg8MV5nRULAxujn83fnua7Chz7fd4PeNcqoYMCFSBZG6O1Cl70Lwtsp33jWxfg4k3yvL98ryo7KI3x7wT9ca_e3-p_8A3zJqCg</recordid><startdate>20240213</startdate><enddate>20240213</enddate><creator>Tan, Yun</creator><creator>Li, Chunzhi</creator><creator>Qin, Jiaohua</creator><creator>Xue, Youyuan</creator><creator>Xiang, Xuyu</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-9855-8234</orcidid><orcidid>https://orcid.org/0000-0003-0695-2283</orcidid><orcidid>https://orcid.org/0000-0002-7549-7731</orcidid></search><sort><creationdate>20240213</creationdate><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><author>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Audio data</topic><topic>Bias</topic><topic>Business metrics</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Medical imaging</topic><topic>Medical research</topic><topic>Neural networks</topic><topic>Radiology</topic><topic>Researchers</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Yun</creatorcontrib><creatorcontrib>Li, Chunzhi</creatorcontrib><creatorcontrib>Qin, Jiaohua</creatorcontrib><creatorcontrib>Xue, Youyuan</creatorcontrib><creatorcontrib>Xiang, Xuyu</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of intelligent systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Yun</au><au>Li, Chunzhi</au><au>Qin, Jiaohua</au><au>Xue, Youyuan</au><au>Xiang, Xuyu</au><au>Gianni, Costa</au><au>Costa Gianni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</atitle><jtitle>International journal of intelligent systems</jtitle><date>2024-02-13</date><risdate>2024</risdate><volume>2024</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0884-8173</issn><eissn>1098-111X</eissn><abstract>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2024/6680546</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-9855-8234</orcidid><orcidid>https://orcid.org/0000-0003-0695-2283</orcidid><orcidid>https://orcid.org/0000-0002-7549-7731</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0884-8173
ispartof International journal of intelligent systems, 2024-02, Vol.2024, p.1-12
issn 0884-8173
1098-111X
language eng
recordid cdi_proquest_journals_2931377591
source ProQuest Central Essentials; ProQuest Central (Alumni Edition); ProQuest Central Student; ProQuest Central Korea; ProQuest Central UK/Ireland; Wiley Online Library (Open Access Collection); Alma/SFX Local Collection; ProQuest Central
subjects Audio data
Bias
Business metrics
Datasets
Deep learning
Medical imaging
Medical research
Neural networks
Radiology
Researchers
Transformers
title Medical Image Description Based on Multimodal Auxiliary Signals and Transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T00%3A47%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Medical%20Image%20Description%20Based%20on%20Multimodal%20Auxiliary%20Signals%20and%20Transformer&rft.jtitle=International%20journal%20of%20intelligent%20systems&rft.au=Tan,%20Yun&rft.date=2024-02-13&rft.volume=2024&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0884-8173&rft.eissn=1098-111X&rft_id=info:doi/10.1155/2024/6680546&rft_dat=%3Cproquest_cross%3E2931377591%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2931377591&rft_id=info:pmid/&rfr_iscdi=true