Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in ima...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | He, Yongkang Chen, Mingjin Yang, Zhijing Lu, Yongyi |
description | This paper seeks to address the dense labeling problems where a significant
fraction of the dataset can be pruned without sacrificing much accuracy. We
observe that, on standard medical image segmentation benchmarks, the loss
gradient norm-based metrics of individual training examples applied in image
classification fail to identify the important samples. To address this issue,
we propose a data pruning method by taking into consideration the training
dynamics on target regions using Dynamic Average Dice (DAD) score. To the best
of our knowledge, we are among the first to address the data importance in
dense labeling tasks in the field of medical image analysis, making the
following contributions: (1) investigating the underlying causes with rigorous
empirical analysis, and (2) determining effective data pruning approach in
dense labeling problems. Our solution can be used as a strong yet simple
baseline to select important examples for medical image segmentation with
combined data sources. |
doi_str_mv | 10.48550/arxiv.2308.01189 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2308_01189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2308_01189</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-447c45e2b0ece3a6c96094b5a7775385e6430876fcba2bd13f3026091df7d9cb3</originalsourceid><addsrcrecordid>eNotj71OwzAUhb0woMIDMOEXcHDiv5gNpQUqtQKJDmzRtXMdWUpS5LoVvD1pYTrD-XR0PkLuSl7IWin-AOk7nopK8LrgZVnba_K5hAyswSmn6OkyYn6kqxDQ53hCuj0OOTI_t5jomTxgpu_pOMWpp2Gf6Ba76GGg6xF6pB_YjzMLOe6nG3IVYDjg7X8uyO55tWte2ebtZd08bRhoY5mUxkuFlePoUYD2VnMrnQJjjBK1Qi3ns0YH76ByXSmC4NWMlF0wnfVOLMj93-xFrf1KcYT0054V24ui-AV4IEvE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><source>arXiv.org</source><creator>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</creator><creatorcontrib>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</creatorcontrib><description>This paper seeks to address the dense labeling problems where a significant
fraction of the dataset can be pruned without sacrificing much accuracy. We
observe that, on standard medical image segmentation benchmarks, the loss
gradient norm-based metrics of individual training examples applied in image
classification fail to identify the important samples. To address this issue,
we propose a data pruning method by taking into consideration the training
dynamics on target regions using Dynamic Average Dice (DAD) score. To the best
of our knowledge, we are among the first to address the data importance in
dense labeling tasks in the field of medical image analysis, making the
following contributions: (1) investigating the underlying causes with rigorous
empirical analysis, and (2) determining effective data pruning approach in
dense labeling problems. Our solution can be used as a strong yet simple
baseline to select important examples for medical image segmentation with
combined data sources.</description><identifier>DOI: 10.48550/arxiv.2308.01189</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2308.01189$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2308.01189$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Yongkang</creatorcontrib><creatorcontrib>Chen, Mingjin</creatorcontrib><creatorcontrib>Yang, Zhijing</creatorcontrib><creatorcontrib>Lu, Yongyi</creatorcontrib><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><description>This paper seeks to address the dense labeling problems where a significant
fraction of the dataset can be pruned without sacrificing much accuracy. We
observe that, on standard medical image segmentation benchmarks, the loss
gradient norm-based metrics of individual training examples applied in image
classification fail to identify the important samples. To address this issue,
we propose a data pruning method by taking into consideration the training
dynamics on target regions using Dynamic Average Dice (DAD) score. To the best
of our knowledge, we are among the first to address the data importance in
dense labeling tasks in the field of medical image analysis, making the
following contributions: (1) investigating the underlying causes with rigorous
empirical analysis, and (2) determining effective data pruning approach in
dense labeling problems. Our solution can be used as a strong yet simple
baseline to select important examples for medical image segmentation with
combined data sources.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAUhb0woMIDMOEXcHDiv5gNpQUqtQKJDmzRtXMdWUpS5LoVvD1pYTrD-XR0PkLuSl7IWin-AOk7nopK8LrgZVnba_K5hAyswSmn6OkyYn6kqxDQ53hCuj0OOTI_t5jomTxgpu_pOMWpp2Gf6Ba76GGg6xF6pB_YjzMLOe6nG3IVYDjg7X8uyO55tWte2ebtZd08bRhoY5mUxkuFlePoUYD2VnMrnQJjjBK1Qi3ns0YH76ByXSmC4NWMlF0wnfVOLMj93-xFrf1KcYT0054V24ui-AV4IEvE</recordid><startdate>20230802</startdate><enddate>20230802</enddate><creator>He, Yongkang</creator><creator>Chen, Mingjin</creator><creator>Yang, Zhijing</creator><creator>Lu, Yongyi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230802</creationdate><title>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</title><author>He, Yongkang ; Chen, Mingjin ; Yang, Zhijing ; Lu, Yongyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-447c45e2b0ece3a6c96094b5a7775385e6430876fcba2bd13f3026091df7d9cb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Yongkang</creatorcontrib><creatorcontrib>Chen, Mingjin</creatorcontrib><creatorcontrib>Yang, Zhijing</creatorcontrib><creatorcontrib>Lu, Yongyi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Yongkang</au><au>Chen, Mingjin</au><au>Yang, Zhijing</au><au>Lu, Yongyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation</atitle><date>2023-08-02</date><risdate>2023</risdate><abstract>This paper seeks to address the dense labeling problems where a significant
fraction of the dataset can be pruned without sacrificing much accuracy. We
observe that, on standard medical image segmentation benchmarks, the loss
gradient norm-based metrics of individual training examples applied in image
classification fail to identify the important samples. To address this issue,
we propose a data pruning method by taking into consideration the training
dynamics on target regions using Dynamic Average Dice (DAD) score. To the best
of our knowledge, we are among the first to address the data importance in
dense labeling tasks in the field of medical image analysis, making the
following contributions: (1) investigating the underlying causes with rigorous
empirical analysis, and (2) determining effective data pruning approach in
dense labeling problems. Our solution can be used as a strong yet simple
baseline to select important examples for medical image segmentation with
combined data sources.</abstract><doi>10.48550/arxiv.2308.01189</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2308.01189 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2308_01189 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition |
title | Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T12%3A55%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data-Centric%20Diet:%20Effective%20Multi-center%20Dataset%20Pruning%20for%20Medical%20Image%20Segmentation&rft.au=He,%20Yongkang&rft.date=2023-08-02&rft_id=info:doi/10.48550/arxiv.2308.01189&rft_dat=%3Carxiv_GOX%3E2308_01189%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |