Combating noisy labels in object detection datasets
The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The quality of training datasets for deep neural networks is a key factor
contributing to the accuracy of resulting models. This effect is amplified in
difficult tasks such as object detection. Dealing with errors in datasets is
often limited to accepting that some fraction of examples are incorrect,
estimating their confidence, and either assigning appropriate weights or
ignoring uncertain ones during training. In this work, we propose a different
approach. We introduce the Confident Learning for Object Detection (CLOD)
algorithm for assessing the quality of each label in object detection datasets,
identifying missing, spurious, mislabeled, and mislocated bounding boxes and
suggesting corrections. By focusing on finding incorrect examples in the
training datasets, we can eliminate them at the root. Suspicious bounding boxes
can be reviewed to improve the quality of the dataset, leading to better models
without further complicating their already complex architectures. The proposed
method is able to point out nearly 80% of artificially disturbed bounding boxes
with a false positive rate below 0.1. Cleaning the datasets by applying the
most confident automatic suggestions improved mAP scores by 16% to 46%,
depending on the dataset, without any modifications to the network
architectures. This approach shows promising potential in rectifying
state-of-the-art object detection datasets. |
---|---|
DOI: | 10.48550/arxiv.2211.13993 |