DiffusionDet: Diffusion Model for Object Detection

We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In infere...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Shoufa, Sun, Peize, Song, Yibing, Luo, Ping
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Shoufa Sun, Peize Song, Yibing Luo, Ping
description	We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.
doi_str_mv	10.48550/arxiv.2211.09788
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_09788</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_09788</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-55aa26eb2522d7b18c3fb3c9b881322707c83e0b4bf8624703a7aa80e179002b3</originalsourceid><addsrcrecordid>eNo9jsGKwjAURbNxMagfMCvzA60vL03z6k7U0QHFjfvyUhOoVCutiv69HZVZHS4HLkeIbwVxQsbAmJt7eYsRlYohs0RfAudlCNe2rE9zf5nI_yU39d5XMtSN3LqDLy6y8x06NRC9wFXrhx_2xe5nsZutovV2-TubriNOLUXGMGPqHRrEvXWKCh2cLjJHpDSiBVuQ9uASFyjFxIJmy0zglc0A0Om-GL1vX9H5uSmP3Dzyv_j8Fa-fa189lg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DiffusionDet: Diffusion Model for Object Detection</title><source>arXiv.org</source><creator>Chen, Shoufa ; Sun, Peize ; Song, Yibing ; Luo, Ping</creator><creatorcontrib>Chen, Shoufa ; Sun, Peize ; Song, Yibing ; Luo, Ping</creatorcontrib><description>We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.</description><identifier>DOI: 10.48550/arxiv.2211.09788</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.09788$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.09788$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Shoufa</creatorcontrib><creatorcontrib>Sun, Peize</creatorcontrib><creatorcontrib>Song, Yibing</creatorcontrib><creatorcontrib>Luo, Ping</creatorcontrib><title>DiffusionDet: Diffusion Model for Object Detection</title><description>We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo9jsGKwjAURbNxMagfMCvzA60vL03z6k7U0QHFjfvyUhOoVCutiv69HZVZHS4HLkeIbwVxQsbAmJt7eYsRlYohs0RfAudlCNe2rE9zf5nI_yU39d5XMtSN3LqDLy6y8x06NRC9wFXrhx_2xe5nsZutovV2-TubriNOLUXGMGPqHRrEvXWKCh2cLjJHpDSiBVuQ9uASFyjFxIJmy0zglc0A0Om-GL1vX9H5uSmP3Dzyv_j8Fa-fa189lg</recordid><startdate>20221117</startdate><enddate>20221117</enddate><creator>Chen, Shoufa</creator><creator>Sun, Peize</creator><creator>Song, Yibing</creator><creator>Luo, Ping</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221117</creationdate><title>DiffusionDet: Diffusion Model for Object Detection</title><author>Chen, Shoufa ; Sun, Peize ; Song, Yibing ; Luo, Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-55aa26eb2522d7b18c3fb3c9b881322707c83e0b4bf8624703a7aa80e179002b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Shoufa</creatorcontrib><creatorcontrib>Sun, Peize</creatorcontrib><creatorcontrib>Song, Yibing</creatorcontrib><creatorcontrib>Luo, Ping</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Shoufa</au><au>Sun, Peize</au><au>Song, Yibing</au><au>Luo, Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DiffusionDet: Diffusion Model for Object Detection</atitle><date>2022-11-17</date><risdate>2022</risdate><abstract>We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.</abstract><doi>10.48550/arxiv.2211.09788</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2211.09788
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2211_09788
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	DiffusionDet: Diffusion Model for Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T06%3A10%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DiffusionDet:%20Diffusion%20Model%20for%20Object%20Detection&rft.au=Chen,%20Shoufa&rft.date=2022-11-17&rft_id=info:doi/10.48550/arxiv.2211.09788&rft_dat=%3Carxiv_GOX%3E2211_09788%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true