Combating noisy labels in object detection datasets

The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chachuła, Krystian, Łyskawa, Jakub, Olber, Bartłomiej, Frątczak, Piotr, Popowicz, Adam, Radlak, Krystian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chachuła, Krystian Łyskawa, Jakub Olber, Bartłomiej Frątczak, Piotr Popowicz, Adam Radlak, Krystian
description	The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.
doi_str_mv	10.48550/arxiv.2211.13993
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_13993</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_13993</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-61fb4fa8bf51c3475f90d79b3f7de22c227b0e172b057d5d4045159442c2f9973</originalsourceid><addsrcrecordid>eNotzrsKwjAYhuEsDqJegJO5gdYcjRmleALBxb38aRKJ9CBNEHv3Hqd3-ODjQWhOSS7WUpIl9M_wyBmjNKdcaz5GvOgaAym0V9x2IQ64BuPqiEOLO3NzVcLWpXdC12ILCaJLcYpGHuroZv9O0GW3vRSH7HTeH4vNKYOV4tmKeiM8rI2XtOJCSa-JVdpwr6xjrGJMGeKoYoZIZaUVREgqtRDvyWut-AQtfrdfdXnvQwP9UH705VfPXxt0PpA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Combating noisy labels in object detection datasets</title><source>arXiv.org</source><creator>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</creator><creatorcontrib>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</creatorcontrib><description>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</description><identifier>DOI: 10.48550/arxiv.2211.13993</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.13993$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.13993$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chachuła, Krystian</creatorcontrib><creatorcontrib>Łyskawa, Jakub</creatorcontrib><creatorcontrib>Olber, Bartłomiej</creatorcontrib><creatorcontrib>Frątczak, Piotr</creatorcontrib><creatorcontrib>Popowicz, Adam</creatorcontrib><creatorcontrib>Radlak, Krystian</creatorcontrib><title>Combating noisy labels in object detection datasets</title><description>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrsKwjAYhuEsDqJegJO5gdYcjRmleALBxb38aRKJ9CBNEHv3Hqd3-ODjQWhOSS7WUpIl9M_wyBmjNKdcaz5GvOgaAym0V9x2IQ64BuPqiEOLO3NzVcLWpXdC12ILCaJLcYpGHuroZv9O0GW3vRSH7HTeH4vNKYOV4tmKeiM8rI2XtOJCSa-JVdpwr6xjrGJMGeKoYoZIZaUVREgqtRDvyWut-AQtfrdfdXnvQwP9UH705VfPXxt0PpA</recordid><startdate>20221125</startdate><enddate>20221125</enddate><creator>Chachuła, Krystian</creator><creator>Łyskawa, Jakub</creator><creator>Olber, Bartłomiej</creator><creator>Frątczak, Piotr</creator><creator>Popowicz, Adam</creator><creator>Radlak, Krystian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221125</creationdate><title>Combating noisy labels in object detection datasets</title><author>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-61fb4fa8bf51c3475f90d79b3f7de22c227b0e172b057d5d4045159442c2f9973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Chachuła, Krystian</creatorcontrib><creatorcontrib>Łyskawa, Jakub</creatorcontrib><creatorcontrib>Olber, Bartłomiej</creatorcontrib><creatorcontrib>Frątczak, Piotr</creatorcontrib><creatorcontrib>Popowicz, Adam</creatorcontrib><creatorcontrib>Radlak, Krystian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chachuła, Krystian</au><au>Łyskawa, Jakub</au><au>Olber, Bartłomiej</au><au>Frątczak, Piotr</au><au>Popowicz, Adam</au><au>Radlak, Krystian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combating noisy labels in object detection datasets</atitle><date>2022-11-25</date><risdate>2022</risdate><abstract>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</abstract><doi>10.48550/arxiv.2211.13993</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2211.13993
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2211_13993
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
title	Combating noisy labels in object detection datasets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T11%3A09%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combating%20noisy%20labels%20in%20object%20detection%20datasets&rft.au=Chachu%C5%82a,%20Krystian&rft.date=2022-11-25&rft_id=info:doi/10.48550/arxiv.2211.13993&rft_dat=%3Carxiv_GOX%3E2211_13993%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true