Cal-DETR: Calibrated Detection Transformer

Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Munir, Muhammad Akhtar, Khan, Salman, Khan, Muhammad Haris, Ali, Mohsen, Khan, Fahad Shahbaz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Munir, Muhammad Akhtar
Khan, Salman
Khan, Muhammad Haris
Ali, Mohsen
Khan, Fahad Shahbaz
description Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.
doi_str_mv 10.48550/arxiv.2311.03570
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_03570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_03570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4c73ae6c8536b067c2cc1e4e41dc94398d0631376083593f35896516bd3466783</originalsourceid><addsrcrecordid>eNotzr0KwjAUhuEsDqJegJOdhdakJzlJ3aStP1AQJHtJ0xQKaiUtonfv7_S908dDyJzRiCsh6Mr4R3uPYmAsoiAkHZNlas5hluvTOnhXW3kzuDrI3ODs0HbXQHtz7ZvOX5yfklFjzr2b_XdC9DbX6T4sjrtDuilCg5KG3EowDq0SgBVFaWNrmeOOs9omHBJVUwQGEqkCkUADQiUoGFY1cESpYEIWv9svtrz59mL8s_ygyy8aXpMrOVk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cal-DETR: Calibrated Detection Transformer</title><source>arXiv.org</source><creator>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</creator><creatorcontrib>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</creatorcontrib><description>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</description><identifier>DOI: 10.48550/arxiv.2311.03570</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.03570$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.03570$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Munir, Muhammad Akhtar</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Khan, Muhammad Haris</creatorcontrib><creatorcontrib>Ali, Mohsen</creatorcontrib><creatorcontrib>Khan, Fahad Shahbaz</creatorcontrib><title>Cal-DETR: Calibrated Detection Transformer</title><description>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzr0KwjAUhuEsDqJegJOdhdakJzlJ3aStP1AQJHtJ0xQKaiUtonfv7_S908dDyJzRiCsh6Mr4R3uPYmAsoiAkHZNlas5hluvTOnhXW3kzuDrI3ODs0HbXQHtz7ZvOX5yfklFjzr2b_XdC9DbX6T4sjrtDuilCg5KG3EowDq0SgBVFaWNrmeOOs9omHBJVUwQGEqkCkUADQiUoGFY1cESpYEIWv9svtrz59mL8s_ygyy8aXpMrOVk</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Munir, Muhammad Akhtar</creator><creator>Khan, Salman</creator><creator>Khan, Muhammad Haris</creator><creator>Ali, Mohsen</creator><creator>Khan, Fahad Shahbaz</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231106</creationdate><title>Cal-DETR: Calibrated Detection Transformer</title><author>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4c73ae6c8536b067c2cc1e4e41dc94398d0631376083593f35896516bd3466783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Munir, Muhammad Akhtar</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Khan, Muhammad Haris</creatorcontrib><creatorcontrib>Ali, Mohsen</creatorcontrib><creatorcontrib>Khan, Fahad Shahbaz</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Munir, Muhammad Akhtar</au><au>Khan, Salman</au><au>Khan, Muhammad Haris</au><au>Ali, Mohsen</au><au>Khan, Fahad Shahbaz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cal-DETR: Calibrated Detection Transformer</atitle><date>2023-11-06</date><risdate>2023</risdate><abstract>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</abstract><doi>10.48550/arxiv.2311.03570</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2311.03570
ispartof
issn
language eng
recordid cdi_arxiv_primary_2311_03570
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Cal-DETR: Calibrated Detection Transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A45%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cal-DETR:%20Calibrated%20Detection%20Transformer&rft.au=Munir,%20Muhammad%20Akhtar&rft.date=2023-11-06&rft_id=info:doi/10.48550/arxiv.2311.03570&rft_dat=%3Carxiv_GOX%3E2311_03570%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true