Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in ima...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: He, Yongkang, Chen, Mingjin, Yang, Zhijing, Lu, Yongyi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator He, Yongkang
Chen, Mingjin
Yang, Zhijing
Lu, Yongyi
description This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.
doi_str_mv 10.48550/arxiv.2308.01189
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2308_01189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2308_01189</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-447c45e2b0ece3a6c96094b5a7775385e6430876fcba2bd13f3026091df7d9cb3</originalsourceid><addsrcrecordid>eNotj71OwzAUhb0woMIDMOEXcHDiv5gNpQUqtQKJDmzRtXMdWUpS5LoVvD1pYTrD-XR0PkLuSl7IWin-AOk7nopK8LrgZVnba_K5hAyswSmn6OkyYn6kqxDQ53hCuj0OOTI_t5jomTxgpu_pOMWpp2Gf6Ba76GGg6xF6pB_YjzMLOe6nG3IVYDjg7X8uyO55tWte2ebtZd08bRhoY5mUxkuFlePoUYD2VnMrnQJjjBK1Qi3ns0YH76ByXSmC4NWMlF0wnfVOLMj93-xFrf1KcYT0054V24ui-AV4IEvE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><source>arXiv.org</source><creator>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</creator><creatorcontrib>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</creatorcontrib><description>This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.</description><identifier>DOI: 10.48550/arxiv.2308.01189</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2308.01189$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2308.01189$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Yongkang</creatorcontrib><creatorcontrib>Chen, Mingjin</creatorcontrib><creatorcontrib>Yang, Zhijing</creatorcontrib><creatorcontrib>Lu, Yongyi</creatorcontrib><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><description>This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAUhb0woMIDMOEXcHDiv5gNpQUqtQKJDmzRtXMdWUpS5LoVvD1pYTrD-XR0PkLuSl7IWin-AOk7nopK8LrgZVnba_K5hAyswSmn6OkyYn6kqxDQ53hCuj0OOTI_t5jomTxgpu_pOMWpp2Gf6Ba76GGg6xF6pB_YjzMLOe6nG3IVYDjg7X8uyO55tWte2ebtZd08bRhoY5mUxkuFlePoUYD2VnMrnQJjjBK1Qi3ns0YH76ByXSmC4NWMlF0wnfVOLMj93-xFrf1KcYT0054V24ui-AV4IEvE</recordid><startdate>20230802</startdate><enddate>20230802</enddate><creator>He, Yongkang</creator><creator>Chen, Mingjin</creator><creator>Yang, Zhijing</creator><creator>Lu, Yongyi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230802</creationdate><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><author>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-447c45e2b0ece3a6c96094b5a7775385e6430876fcba2bd13f3026091df7d9cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Yongkang</creatorcontrib><creatorcontrib>Chen, Mingjin</creatorcontrib><creatorcontrib>Yang, Zhijing</creatorcontrib><creatorcontrib>Lu, Yongyi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Yongkang</au><au>Chen, Mingjin</au><au>Yang, Zhijing</au><au>Lu, Yongyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</atitle><date>2023-08-02</date><risdate>2023</risdate><abstract>This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.</abstract><doi>10.48550/arxiv.2308.01189</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2308.01189
ispartof
issn
language eng
recordid cdi_arxiv_primary_2308_01189
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
title Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T12%3A55%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data-Centric%20Diet:%20Effective%20Multi-center%20Dataset%20Pruning%20for%20Medical%20Image%20Segmentation&rft.au=He,%20Yongkang&rft.date=2023-08-02&rft_id=info:doi/10.48550/arxiv.2308.01189&rft_dat=%3Carxiv_GOX%3E2308_01189%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true