Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning

The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on medical imaging 2024-07, Vol.43 (7), p.2657-2669
Hauptverfasser: Liu, Aohan, Guo, Yuchen, Yong, Jun-Hai, Xu, Feng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2669
container_issue 7
container_start_page 2657
container_title IEEE transactions on medical imaging
container_volume 43
creator Liu, Aohan
Guo, Yuchen
Yong, Jun-Hai
Xu, Feng
description The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.
doi_str_mv 10.1109/TMI.2024.3372638
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10458706</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10458706</ieee_id><sourcerecordid>2937702043</sourcerecordid><originalsourceid>FETCH-LOGICAL-c301t-5e65af4b25661029dd27dc1a613aa721b3109f6272da3a402389d0254800e53a3</originalsourceid><addsrcrecordid>eNpdkE1vEzEQhi0EoqFw54DQSly4bDr-Wu8eUVRCpFRIbRHcrMnuJLja2MH2Vuq_x1UCqnqaOTzvq5mHsfcc5pxDd3F7tZoLEGoupRGNbF-wGde6rYVWv16yGQjT1gCNOGNvUroD4EpD95qdyVZJw1U3Y3g1jdnVy4jO01Bd4-DCGHYP1TUdQszVkjxFzC746qfLv6sb8pl8T_Wa7mmsVnvclR39bipLtQg-R0zZ3VO1Joze-d1b9mqLY6J3p3nOfny9vF18q9ffl6vFl3XdS-C51tRo3KqN0E3DQXTDIMzQc2y4RDSCb2R5eNsIIwaUqEDIthugPNoCkJYoz9nnY-8hhj8TpWz3LvU0jugpTMmKThoDApQs6Kdn6F2Yoi_XWQlGK26E5IWCI9XHkFKkrT1Et8f4YDnYR_226LeP-u1Jf4l8PBVPmz0N_wP_fBfgwxFwRPSkT-nWQCP_AsqAh5I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3075417231</pqid></control><display><type>article</type><title>Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Aohan ; Guo, Yuchen ; Yong, Jun-Hai ; Xu, Feng</creator><creatorcontrib>Liu, Aohan ; Guo, Yuchen ; Yong, Jun-Hai ; Xu, Feng</creatorcontrib><description>The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2024.3372638</identifier><identifier>PMID: 38437149</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Annotations ; Biomedical imaging ; contrastive learning ; Databases, Factual ; Decoders ; Decoding ; Descriptions ; Humans ; Image acquisition ; Learning ; Machine Learning ; Medical imaging ; Medical report generation ; multi-grained ; Natural Language Processing ; Radiology ; Radiology - methods ; Radiology Information Systems ; Reinforcement learning ; Self-supervised learning ; Sentences ; State-of-the-art reviews ; Task analysis ; Training</subject><ispartof>IEEE transactions on medical imaging, 2024-07, Vol.43 (7), p.2657-2669</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c301t-5e65af4b25661029dd27dc1a613aa721b3109f6272da3a402389d0254800e53a3</cites><orcidid>0009-0007-8688-7624 ; 0000-0002-0953-1057 ; 0000-0001-9808-9805</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10458706$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10458706$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38437149$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Aohan</creatorcontrib><creatorcontrib>Guo, Yuchen</creatorcontrib><creatorcontrib>Yong, Jun-Hai</creatorcontrib><creatorcontrib>Xu, Feng</creatorcontrib><title>Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.</description><subject>Algorithms</subject><subject>Annotations</subject><subject>Biomedical imaging</subject><subject>contrastive learning</subject><subject>Databases, Factual</subject><subject>Decoders</subject><subject>Decoding</subject><subject>Descriptions</subject><subject>Humans</subject><subject>Image acquisition</subject><subject>Learning</subject><subject>Machine Learning</subject><subject>Medical imaging</subject><subject>Medical report generation</subject><subject>multi-grained</subject><subject>Natural Language Processing</subject><subject>Radiology</subject><subject>Radiology - methods</subject><subject>Radiology Information Systems</subject><subject>Reinforcement learning</subject><subject>Self-supervised learning</subject><subject>Sentences</subject><subject>State-of-the-art reviews</subject><subject>Task analysis</subject><subject>Training</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1vEzEQhi0EoqFw54DQSly4bDr-Wu8eUVRCpFRIbRHcrMnuJLja2MH2Vuq_x1UCqnqaOTzvq5mHsfcc5pxDd3F7tZoLEGoupRGNbF-wGde6rYVWv16yGQjT1gCNOGNvUroD4EpD95qdyVZJw1U3Y3g1jdnVy4jO01Bd4-DCGHYP1TUdQszVkjxFzC746qfLv6sb8pl8T_Wa7mmsVnvclR39bipLtQg-R0zZ3VO1Joze-d1b9mqLY6J3p3nOfny9vF18q9ffl6vFl3XdS-C51tRo3KqN0E3DQXTDIMzQc2y4RDSCb2R5eNsIIwaUqEDIthugPNoCkJYoz9nnY-8hhj8TpWz3LvU0jugpTMmKThoDApQs6Kdn6F2Yoi_XWQlGK26E5IWCI9XHkFKkrT1Et8f4YDnYR_226LeP-u1Jf4l8PBVPmz0N_wP_fBfgwxFwRPSkT-nWQCP_AsqAh5I</recordid><startdate>20240701</startdate><enddate>20240701</enddate><creator>Liu, Aohan</creator><creator>Guo, Yuchen</creator><creator>Yong, Jun-Hai</creator><creator>Xu, Feng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0007-8688-7624</orcidid><orcidid>https://orcid.org/0000-0002-0953-1057</orcidid><orcidid>https://orcid.org/0000-0001-9808-9805</orcidid></search><sort><creationdate>20240701</creationdate><title>Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning</title><author>Liu, Aohan ; Guo, Yuchen ; Yong, Jun-Hai ; Xu, Feng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c301t-5e65af4b25661029dd27dc1a613aa721b3109f6272da3a402389d0254800e53a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Annotations</topic><topic>Biomedical imaging</topic><topic>contrastive learning</topic><topic>Databases, Factual</topic><topic>Decoders</topic><topic>Decoding</topic><topic>Descriptions</topic><topic>Humans</topic><topic>Image acquisition</topic><topic>Learning</topic><topic>Machine Learning</topic><topic>Medical imaging</topic><topic>Medical report generation</topic><topic>multi-grained</topic><topic>Natural Language Processing</topic><topic>Radiology</topic><topic>Radiology - methods</topic><topic>Radiology Information Systems</topic><topic>Reinforcement learning</topic><topic>Self-supervised learning</topic><topic>Sentences</topic><topic>State-of-the-art reviews</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Aohan</creatorcontrib><creatorcontrib>Guo, Yuchen</creatorcontrib><creatorcontrib>Yong, Jun-Hai</creatorcontrib><creatorcontrib>Xu, Feng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Aohan</au><au>Guo, Yuchen</au><au>Yong, Jun-Hai</au><au>Xu, Feng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-07-01</date><risdate>2024</risdate><volume>43</volume><issue>7</issue><spage>2657</spage><epage>2669</epage><pages>2657-2669</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38437149</pmid><doi>10.1109/TMI.2024.3372638</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0007-8688-7624</orcidid><orcidid>https://orcid.org/0000-0002-0953-1057</orcidid><orcidid>https://orcid.org/0000-0001-9808-9805</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0062
ispartof IEEE transactions on medical imaging, 2024-07, Vol.43 (7), p.2657-2669
issn 0278-0062
1558-254X
1558-254X
language eng
recordid cdi_ieee_primary_10458706
source IEEE Electronic Library (IEL)
subjects Algorithms
Annotations
Biomedical imaging
contrastive learning
Databases, Factual
Decoders
Decoding
Descriptions
Humans
Image acquisition
Learning
Machine Learning
Medical imaging
Medical report generation
multi-grained
Natural Language Processing
Radiology
Radiology - methods
Radiology Information Systems
Reinforcement learning
Self-supervised learning
Sentences
State-of-the-art reviews
Task analysis
Training
title Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T15%3A08%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Grained%20Radiology%20Report%20Generation%20With%20Sentence-Level%20Image-Language%20Contrastive%20Learning&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Liu,%20Aohan&rft.date=2024-07-01&rft.volume=43&rft.issue=7&rft.spage=2657&rft.epage=2669&rft.pages=2657-2669&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2024.3372638&rft_dat=%3Cproquest_RIE%3E2937702043%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3075417231&rft_id=info:pmid/38437149&rft_ieee_id=10458706&rfr_iscdi=true