Combating noisy labels in object detection datasets

The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chachuła, Krystian, Łyskawa, Jakub, Olber, Bartłomiej, Frątczak, Piotr, Popowicz, Adam, Radlak, Krystian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Chachuła, Krystian
Łyskawa, Jakub
Olber, Bartłomiej
Frątczak, Piotr
Popowicz, Adam
Radlak, Krystian
description The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.
doi_str_mv 10.48550/arxiv.2211.13993
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_13993</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_13993</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-61fb4fa8bf51c3475f90d79b3f7de22c227b0e172b057d5d4045159442c2f9973</originalsourceid><addsrcrecordid>eNotzrsKwjAYhuEsDqJegJO5gdYcjRmleALBxb38aRKJ9CBNEHv3Hqd3-ODjQWhOSS7WUpIl9M_wyBmjNKdcaz5GvOgaAym0V9x2IQ64BuPqiEOLO3NzVcLWpXdC12ILCaJLcYpGHuroZv9O0GW3vRSH7HTeH4vNKYOV4tmKeiM8rI2XtOJCSa-JVdpwr6xjrGJMGeKoYoZIZaUVREgqtRDvyWut-AQtfrdfdXnvQwP9UH705VfPXxt0PpA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Combating noisy labels in object detection datasets</title><source>arXiv.org</source><creator>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</creator><creatorcontrib>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</creatorcontrib><description>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</description><identifier>DOI: 10.48550/arxiv.2211.13993</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.13993$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.13993$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chachuła, Krystian</creatorcontrib><creatorcontrib>Łyskawa, Jakub</creatorcontrib><creatorcontrib>Olber, Bartłomiej</creatorcontrib><creatorcontrib>Frątczak, Piotr</creatorcontrib><creatorcontrib>Popowicz, Adam</creatorcontrib><creatorcontrib>Radlak, Krystian</creatorcontrib><title>Combating noisy labels in object detection datasets</title><description>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrsKwjAYhuEsDqJegJO5gdYcjRmleALBxb38aRKJ9CBNEHv3Hqd3-ODjQWhOSS7WUpIl9M_wyBmjNKdcaz5GvOgaAym0V9x2IQ64BuPqiEOLO3NzVcLWpXdC12ILCaJLcYpGHuroZv9O0GW3vRSH7HTeH4vNKYOV4tmKeiM8rI2XtOJCSa-JVdpwr6xjrGJMGeKoYoZIZaUVREgqtRDvyWut-AQtfrdfdXnvQwP9UH705VfPXxt0PpA</recordid><startdate>20221125</startdate><enddate>20221125</enddate><creator>Chachuła, Krystian</creator><creator>Łyskawa, Jakub</creator><creator>Olber, Bartłomiej</creator><creator>Frątczak, Piotr</creator><creator>Popowicz, Adam</creator><creator>Radlak, Krystian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221125</creationdate><title>Combating noisy labels in object detection datasets</title><author>Chachuła, Krystian ; Łyskawa, Jakub ; Olber, Bartłomiej ; Frątczak, Piotr ; Popowicz, Adam ; Radlak, Krystian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-61fb4fa8bf51c3475f90d79b3f7de22c227b0e172b057d5d4045159442c2f9973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Chachuła, Krystian</creatorcontrib><creatorcontrib>Łyskawa, Jakub</creatorcontrib><creatorcontrib>Olber, Bartłomiej</creatorcontrib><creatorcontrib>Frątczak, Piotr</creatorcontrib><creatorcontrib>Popowicz, Adam</creatorcontrib><creatorcontrib>Radlak, Krystian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chachuła, Krystian</au><au>Łyskawa, Jakub</au><au>Olber, Bartłomiej</au><au>Frątczak, Piotr</au><au>Popowicz, Adam</au><au>Radlak, Krystian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combating noisy labels in object detection datasets</atitle><date>2022-11-25</date><risdate>2022</risdate><abstract>The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.</abstract><doi>10.48550/arxiv.2211.13993</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2211.13993
ispartof
issn
language eng
recordid cdi_arxiv_primary_2211_13993
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
title Combating noisy labels in object detection datasets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T11%3A09%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combating%20noisy%20labels%20in%20object%20detection%20datasets&rft.au=Chachu%C5%82a,%20Krystian&rft.date=2022-11-25&rft_id=info:doi/10.48550/arxiv.2211.13993&rft_dat=%3Carxiv_GOX%3E2211_13993%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true