Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images

This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multip...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:European radiology 2024-12
Hauptverfasser: Guérendel, Corentin, Petrychenko, Liliana, Chupetlovska, Kalina, Bodalal, Zuhir, Beets-Tan, Regina G H, Benson, Sean
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title European radiology
container_volume
creator Guérendel, Corentin
Petrychenko, Liliana
Chupetlovska, Kalina
Bodalal, Zuhir
Beets-Tan, Regina G H
Benson, Sean
description This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.
doi_str_mv 10.1007/s00330-024-11321-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3150521337</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3150521337</sourcerecordid><originalsourceid>FETCH-LOGICAL-c184t-115d60729a4b1e3bcb46588a1fd79ed5e69e0d475cbbc4126edd5e7356245ce93</originalsourceid><addsrcrecordid>eNpNkDtPAzEQhC0EIiHwByiQS4oc-HmPEkUQkCLRhNqyfXvB4XIOtq-AX8_lAaLa1WhmtPshdE3JHSWkuI-EcE4ywkRGKWc0YydoTAVnGSWlOP23j9BFjGtCSEVFcY5GvCp4KWU1Rus5dBB06761ca1LX1McvOlj6iDGKdZdja0PAWxyvsPG6Yh9gyOsNtAlvRP3Qnr3QVtnsQ8rPUg64eDiB3Ydni2x2-gVxEt01ug2wtVxTtDb0-Ny9pwtXucvs4dFZmkp0vCKrHNSsEoLQ4Eba0Quy1LTpi4qqCXkFZBaFNIaYwVlOdSDWHCZMyEtVHyCbg-92-A_e4hJbVy00La6A99HxakkklHOi8HKDlYbfIwBGrUNw7HhS1GidozVgbEaGKs9Y8WG0M2xvzcbqP8iv1D5D7uaeEM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3150521337</pqid></control><display><type>article</type><title>Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images</title><source>SpringerLink Journals</source><creator>Guérendel, Corentin ; Petrychenko, Liliana ; Chupetlovska, Kalina ; Bodalal, Zuhir ; Beets-Tan, Regina G H ; Benson, Sean</creator><creatorcontrib>Guérendel, Corentin ; Petrychenko, Liliana ; Chupetlovska, Kalina ; Bodalal, Zuhir ; Beets-Tan, Regina G H ; Benson, Sean</creatorcontrib><description>This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.</description><identifier>ISSN: 1432-1084</identifier><identifier>EISSN: 1432-1084</identifier><identifier>DOI: 10.1007/s00330-024-11321-2</identifier><identifier>PMID: 39738559</identifier><language>eng</language><publisher>Germany</publisher><ispartof>European radiology, 2024-12</ispartof><rights>2024. The Author(s), under exclusive licence to European Society of Radiology.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c184t-115d60729a4b1e3bcb46588a1fd79ed5e69e0d475cbbc4126edd5e7356245ce93</cites><orcidid>0009-0001-2343-0922</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39738559$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Guérendel, Corentin</creatorcontrib><creatorcontrib>Petrychenko, Liliana</creatorcontrib><creatorcontrib>Chupetlovska, Kalina</creatorcontrib><creatorcontrib>Bodalal, Zuhir</creatorcontrib><creatorcontrib>Beets-Tan, Regina G H</creatorcontrib><creatorcontrib>Benson, Sean</creatorcontrib><title>Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images</title><title>European radiology</title><addtitle>Eur Radiol</addtitle><description>This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.</description><issn>1432-1084</issn><issn>1432-1084</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkDtPAzEQhC0EIiHwByiQS4oc-HmPEkUQkCLRhNqyfXvB4XIOtq-AX8_lAaLa1WhmtPshdE3JHSWkuI-EcE4ywkRGKWc0YydoTAVnGSWlOP23j9BFjGtCSEVFcY5GvCp4KWU1Rus5dBB06761ca1LX1McvOlj6iDGKdZdja0PAWxyvsPG6Yh9gyOsNtAlvRP3Qnr3QVtnsQ8rPUg64eDiB3Ydni2x2-gVxEt01ug2wtVxTtDb0-Ny9pwtXucvs4dFZmkp0vCKrHNSsEoLQ4Eba0Quy1LTpi4qqCXkFZBaFNIaYwVlOdSDWHCZMyEtVHyCbg-92-A_e4hJbVy00La6A99HxakkklHOi8HKDlYbfIwBGrUNw7HhS1GidozVgbEaGKs9Y8WG0M2xvzcbqP8iv1D5D7uaeEM</recordid><startdate>20241231</startdate><enddate>20241231</enddate><creator>Guérendel, Corentin</creator><creator>Petrychenko, Liliana</creator><creator>Chupetlovska, Kalina</creator><creator>Bodalal, Zuhir</creator><creator>Beets-Tan, Regina G H</creator><creator>Benson, Sean</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0001-2343-0922</orcidid></search><sort><creationdate>20241231</creationdate><title>Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images</title><author>Guérendel, Corentin ; Petrychenko, Liliana ; Chupetlovska, Kalina ; Bodalal, Zuhir ; Beets-Tan, Regina G H ; Benson, Sean</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c184t-115d60729a4b1e3bcb46588a1fd79ed5e69e0d475cbbc4126edd5e7356245ce93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guérendel, Corentin</creatorcontrib><creatorcontrib>Petrychenko, Liliana</creatorcontrib><creatorcontrib>Chupetlovska, Kalina</creatorcontrib><creatorcontrib>Bodalal, Zuhir</creatorcontrib><creatorcontrib>Beets-Tan, Regina G H</creatorcontrib><creatorcontrib>Benson, Sean</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>European radiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guérendel, Corentin</au><au>Petrychenko, Liliana</au><au>Chupetlovska, Kalina</au><au>Bodalal, Zuhir</au><au>Beets-Tan, Regina G H</au><au>Benson, Sean</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images</atitle><jtitle>European radiology</jtitle><addtitle>Eur Radiol</addtitle><date>2024-12-31</date><risdate>2024</risdate><issn>1432-1084</issn><eissn>1432-1084</eissn><abstract>This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning. We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations. The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart. Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods. Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.</abstract><cop>Germany</cop><pmid>39738559</pmid><doi>10.1007/s00330-024-11321-2</doi><orcidid>https://orcid.org/0009-0001-2343-0922</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1432-1084
ispartof European radiology, 2024-12
issn 1432-1084
1432-1084
language eng
recordid cdi_proquest_miscellaneous_3150521337
source SpringerLink Journals
title Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T12%3A56%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalizability,%20robustness,%20and%20correction%20bias%20of%20segmentations%20of%20thoracic%20organs%20at%20risk%20in%20CT%20images&rft.jtitle=European%20radiology&rft.au=Gu%C3%A9rendel,%20Corentin&rft.date=2024-12-31&rft.issn=1432-1084&rft.eissn=1432-1084&rft_id=info:doi/10.1007/s00330-024-11321-2&rft_dat=%3Cproquest_cross%3E3150521337%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3150521337&rft_id=info:pmid/39738559&rfr_iscdi=true