Cal-DETR: Calibrated Detection Transformer

Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Munir, Muhammad Akhtar, Khan, Salman, Khan, Muhammad Haris, Ali, Mohsen, Khan, Fahad Shahbaz
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Munir, Muhammad Akhtar Khan, Salman Khan, Muhammad Haris Ali, Mohsen Khan, Fahad Shahbaz
description	Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.
doi_str_mv	10.48550/arxiv.2311.03570
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_03570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_03570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4c73ae6c8536b067c2cc1e4e41dc94398d0631376083593f35896516bd3466783</originalsourceid><addsrcrecordid>eNotzr0KwjAUhuEsDqJegJOdhdakJzlJ3aStP1AQJHtJ0xQKaiUtonfv7_S908dDyJzRiCsh6Mr4R3uPYmAsoiAkHZNlas5hluvTOnhXW3kzuDrI3ODs0HbXQHtz7ZvOX5yfklFjzr2b_XdC9DbX6T4sjrtDuilCg5KG3EowDq0SgBVFaWNrmeOOs9omHBJVUwQGEqkCkUADQiUoGFY1cESpYEIWv9svtrz59mL8s_ygyy8aXpMrOVk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cal-DETR: Calibrated Detection Transformer</title><source>arXiv.org</source><creator>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</creator><creatorcontrib>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</creatorcontrib><description>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</description><identifier>DOI: 10.48550/arxiv.2311.03570</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.03570$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.03570$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Munir, Muhammad Akhtar</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Khan, Muhammad Haris</creatorcontrib><creatorcontrib>Ali, Mohsen</creatorcontrib><creatorcontrib>Khan, Fahad Shahbaz</creatorcontrib><title>Cal-DETR: Calibrated Detection Transformer</title><description>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzr0KwjAUhuEsDqJegJOdhdakJzlJ3aStP1AQJHtJ0xQKaiUtonfv7_S908dDyJzRiCsh6Mr4R3uPYmAsoiAkHZNlas5hluvTOnhXW3kzuDrI3ODs0HbXQHtz7ZvOX5yfklFjzr2b_XdC9DbX6T4sjrtDuilCg5KG3EowDq0SgBVFaWNrmeOOs9omHBJVUwQGEqkCkUADQiUoGFY1cESpYEIWv9svtrz59mL8s_ygyy8aXpMrOVk</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Munir, Muhammad Akhtar</creator><creator>Khan, Salman</creator><creator>Khan, Muhammad Haris</creator><creator>Ali, Mohsen</creator><creator>Khan, Fahad Shahbaz</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231106</creationdate><title>Cal-DETR: Calibrated Detection Transformer</title><author>Munir, Muhammad Akhtar ; Khan, Salman ; Khan, Muhammad Haris ; Ali, Mohsen ; Khan, Fahad Shahbaz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4c73ae6c8536b067c2cc1e4e41dc94398d0631376083593f35896516bd3466783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Munir, Muhammad Akhtar</creatorcontrib><creatorcontrib>Khan, Salman</creatorcontrib><creatorcontrib>Khan, Muhammad Haris</creatorcontrib><creatorcontrib>Ali, Mohsen</creatorcontrib><creatorcontrib>Khan, Fahad Shahbaz</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Munir, Muhammad Akhtar</au><au>Khan, Salman</au><au>Khan, Muhammad Haris</au><au>Ali, Mohsen</au><au>Khan, Fahad Shahbaz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cal-DETR: Calibrated Detection Transformer</atitle><date>2023-11-06</date><risdate>2023</risdate><abstract>Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.</abstract><doi>10.48550/arxiv.2311.03570</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2311.03570
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2311_03570
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Cal-DETR: Calibrated Detection Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A45%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cal-DETR:%20Calibrated%20Detection%20Transformer&rft.au=Munir,%20Muhammad%20Akhtar&rft.date=2023-11-06&rft_id=info:doi/10.48550/arxiv.2311.03570&rft_dat=%3Carxiv_GOX%3E2311_03570%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true