Medical Image Description Based on Multimodal Auxiliary Signals and Transformer
Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning...
Gespeichert in:
Veröffentlicht in: | International journal of intelligent systems 2024-02, Vol.2024, p.1-12 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | International journal of intelligent systems |
container_volume | 2024 |
creator | Tan, Yun Li, Chunzhi Qin, Jiaohua Xue, Youyuan Xiang, Xuyu |
description | Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators. |
doi_str_mv | 10.1155/2024/6680546 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2931377591</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2931377591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKs3f0DAo67NJNlNcqz1q9DSgxW8LdlNtqbsR012Uf-9Ke3Z0wwvzwwvD0LXQO4B0nRCCeWTLJMk5dkJGgFRMgGAj1M0IlLyRIJg5-gihC0hAIKnI7RaWuNKXeN5ozcWP9pQerfrXdfiBx2swXFZDnXvms5Eajr8uNpp_4vf3KbVdcC6NXjtdRuqzjfWX6KzKsb26jjH6P35aT17TRarl_lsukhKqnifGFlBxgyURUWNiRGTFErCDKfSQioKLpjMCiEUoVaDUlQYEg8MV5nRULAxujn83fnua7Chz7fd4PeNcqoYMCFSBZG6O1Cl70Lwtsp33jWxfg4k3yvL98ryo7KI3x7wT9ca_e3-p_8A3zJqCg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2931377591</pqid></control><display><type>article</type><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><source>ProQuest Central Essentials</source><source>ProQuest Central (Alumni Edition)</source><source>ProQuest Central Student</source><source>ProQuest Central Korea</source><source>ProQuest Central UK/Ireland</source><source>Wiley Online Library (Open Access Collection)</source><source>Alma/SFX Local Collection</source><source>ProQuest Central</source><creator>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu</creator><contributor>Gianni, Costa ; Costa Gianni</contributor><creatorcontrib>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu ; Gianni, Costa ; Costa Gianni</creatorcontrib><description>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</description><identifier>ISSN: 0884-8173</identifier><identifier>EISSN: 1098-111X</identifier><identifier>DOI: 10.1155/2024/6680546</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Audio data ; Bias ; Business metrics ; Datasets ; Deep learning ; Medical imaging ; Medical research ; Neural networks ; Radiology ; Researchers ; Transformers</subject><ispartof>International journal of intelligent systems, 2024-02, Vol.2024, p.1-12</ispartof><rights>Copyright © 2024 Yun Tan et al.</rights><rights>Copyright © 2024 Yun Tan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</cites><orcidid>0000-0002-9855-8234 ; 0000-0003-0695-2283 ; 0000-0002-7549-7731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2931377591/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2931377591?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,21388,21389,21390,21391,23256,27924,27925,33530,33703,33744,34005,34314,43659,43787,43805,43953,44067,64385,64389,72469,74104,74283,74302,74473,74590</link.rule.ids></links><search><contributor>Gianni, Costa</contributor><contributor>Costa Gianni</contributor><creatorcontrib>Tan, Yun</creatorcontrib><creatorcontrib>Li, Chunzhi</creatorcontrib><creatorcontrib>Qin, Jiaohua</creatorcontrib><creatorcontrib>Xue, Youyuan</creatorcontrib><creatorcontrib>Xiang, Xuyu</creatorcontrib><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><title>International journal of intelligent systems</title><description>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</description><subject>Audio data</subject><subject>Bias</subject><subject>Business metrics</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Medical imaging</subject><subject>Medical research</subject><subject>Neural networks</subject><subject>Radiology</subject><subject>Researchers</subject><subject>Transformers</subject><issn>0884-8173</issn><issn>1098-111X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE1LAzEQhoMoWKs3f0DAo67NJNlNcqz1q9DSgxW8LdlNtqbsR012Uf-9Ke3Z0wwvzwwvD0LXQO4B0nRCCeWTLJMk5dkJGgFRMgGAj1M0IlLyRIJg5-gihC0hAIKnI7RaWuNKXeN5ozcWP9pQerfrXdfiBx2swXFZDnXvms5Eajr8uNpp_4vf3KbVdcC6NXjtdRuqzjfWX6KzKsb26jjH6P35aT17TRarl_lsukhKqnifGFlBxgyURUWNiRGTFErCDKfSQioKLpjMCiEUoVaDUlQYEg8MV5nRULAxujn83fnua7Chz7fd4PeNcqoYMCFSBZG6O1Cl70Lwtsp33jWxfg4k3yvL98ryo7KI3x7wT9ca_e3-p_8A3zJqCg</recordid><startdate>20240213</startdate><enddate>20240213</enddate><creator>Tan, Yun</creator><creator>Li, Chunzhi</creator><creator>Qin, Jiaohua</creator><creator>Xue, Youyuan</creator><creator>Xiang, Xuyu</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-9855-8234</orcidid><orcidid>https://orcid.org/0000-0003-0695-2283</orcidid><orcidid>https://orcid.org/0000-0002-7549-7731</orcidid></search><sort><creationdate>20240213</creationdate><title>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</title><author>Tan, Yun ; Li, Chunzhi ; Qin, Jiaohua ; Xue, Youyuan ; Xiang, Xuyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-d8f163d1cbf2dd2943821c03d428e157b47386b77902ea19927d063dd496da1b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Audio data</topic><topic>Bias</topic><topic>Business metrics</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Medical imaging</topic><topic>Medical research</topic><topic>Neural networks</topic><topic>Radiology</topic><topic>Researchers</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Yun</creatorcontrib><creatorcontrib>Li, Chunzhi</creatorcontrib><creatorcontrib>Qin, Jiaohua</creatorcontrib><creatorcontrib>Xue, Youyuan</creatorcontrib><creatorcontrib>Xiang, Xuyu</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of intelligent systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Yun</au><au>Li, Chunzhi</au><au>Qin, Jiaohua</au><au>Xue, Youyuan</au><au>Xiang, Xuyu</au><au>Gianni, Costa</au><au>Costa Gianni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Medical Image Description Based on Multimodal Auxiliary Signals and Transformer</atitle><jtitle>International journal of intelligent systems</jtitle><date>2024-02-13</date><risdate>2024</risdate><volume>2024</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0884-8173</issn><eissn>1098-111X</eissn><abstract>Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2024/6680546</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-9855-8234</orcidid><orcidid>https://orcid.org/0000-0003-0695-2283</orcidid><orcidid>https://orcid.org/0000-0002-7549-7731</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0884-8173 |
ispartof | International journal of intelligent systems, 2024-02, Vol.2024, p.1-12 |
issn | 0884-8173 1098-111X |
language | eng |
recordid | cdi_proquest_journals_2931377591 |
source | ProQuest Central Essentials; ProQuest Central (Alumni Edition); ProQuest Central Student; ProQuest Central Korea; ProQuest Central UK/Ireland; Wiley Online Library (Open Access Collection); Alma/SFX Local Collection; ProQuest Central |
subjects | Audio data Bias Business metrics Datasets Deep learning Medical imaging Medical research Neural networks Radiology Researchers Transformers |
title | Medical Image Description Based on Multimodal Auxiliary Signals and Transformer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T00%3A47%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Medical%20Image%20Description%20Based%20on%20Multimodal%20Auxiliary%20Signals%20and%20Transformer&rft.jtitle=International%20journal%20of%20intelligent%20systems&rft.au=Tan,%20Yun&rft.date=2024-02-13&rft.volume=2024&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0884-8173&rft.eissn=1098-111X&rft_id=info:doi/10.1155/2024/6680546&rft_dat=%3Cproquest_cross%3E2931377591%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2931377591&rft_id=info:pmid/&rfr_iscdi=true |