CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we fin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shi, Hao, Pang, Chengshan, Zhang, Jiaming, Yang, Kailun, Wu, Yuhao, Ni, Huajian, Lin, Yining, Stiefelhagen, Rainer, Wang, Kaiwei
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Shi, Hao Pang, Chengshan Zhang, Jiaming Yang, Kailun Wu, Yuhao Ni, Huajian Lin, Yining Stiefelhagen, Rainer Wang, Kaiwei
description	Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.
doi_str_mv	10.48550/arxiv.2310.02815
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_02815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_02815</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-43f2ba9c76ca48b036d71e18b3834fd463def0c20564ab258ca25c92a182473c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIrtazsuO0gDrVSpElRso-tHWqO8lFqF_j2hsJmjmcVIh5A7zubSKMUecPyOp7mAaWDCcHVN3ov-ufx4pGUTTphit6dvPfpj9IHCkm7tZ3CJLkOaEPuOfsV0mOowJXaerkLcHxIt-nZoQhu6hGNM5xtyVWNzDLf_nJHdS7krVtlm-7ounjYZ6lxlEmphceFy7VAay0D7nAduLBiQtZcafKiZE0xpiVYo41AotxDIjZA5OJiR-7_bi1U1jLHF8Vz92lUXO_gBT1hJRQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><source>arXiv.org</source><creator>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</creator><creatorcontrib>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</creatorcontrib><description>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</description><identifier>DOI: 10.48550/arxiv.2310.02815</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Robotics</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.02815$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.02815$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shi, Hao</creatorcontrib><creatorcontrib>Pang, Chengshan</creatorcontrib><creatorcontrib>Zhang, Jiaming</creatorcontrib><creatorcontrib>Yang, Kailun</creatorcontrib><creatorcontrib>Wu, Yuhao</creatorcontrib><creatorcontrib>Ni, Huajian</creatorcontrib><creatorcontrib>Lin, Yining</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><creatorcontrib>Wang, Kaiwei</creatorcontrib><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><description>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIrtazsuO0gDrVSpElRso-tHWqO8lFqF_j2hsJmjmcVIh5A7zubSKMUecPyOp7mAaWDCcHVN3ov-ufx4pGUTTphit6dvPfpj9IHCkm7tZ3CJLkOaEPuOfsV0mOowJXaerkLcHxIt-nZoQhu6hGNM5xtyVWNzDLf_nJHdS7krVtlm-7ounjYZ6lxlEmphceFy7VAay0D7nAduLBiQtZcafKiZE0xpiVYo41AotxDIjZA5OJiR-7_bi1U1jLHF8Vz92lUXO_gBT1hJRQ</recordid><startdate>20231004</startdate><enddate>20231004</enddate><creator>Shi, Hao</creator><creator>Pang, Chengshan</creator><creator>Zhang, Jiaming</creator><creator>Yang, Kailun</creator><creator>Wu, Yuhao</creator><creator>Ni, Huajian</creator><creator>Lin, Yining</creator><creator>Stiefelhagen, Rainer</creator><creator>Wang, Kaiwei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231004</creationdate><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><author>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-43f2ba9c76ca48b036d71e18b3834fd463def0c20564ab258ca25c92a182473c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Shi, Hao</creatorcontrib><creatorcontrib>Pang, Chengshan</creatorcontrib><creatorcontrib>Zhang, Jiaming</creatorcontrib><creatorcontrib>Yang, Kailun</creatorcontrib><creatorcontrib>Wu, Yuhao</creatorcontrib><creatorcontrib>Ni, Huajian</creatorcontrib><creatorcontrib>Lin, Yining</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><creatorcontrib>Wang, Kaiwei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shi, Hao</au><au>Pang, Chengshan</au><au>Zhang, Jiaming</au><au>Yang, Kailun</au><au>Wu, Yuhao</au><au>Ni, Huajian</au><au>Lin, Yining</au><au>Stiefelhagen, Rainer</au><au>Wang, Kaiwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</atitle><date>2023-10-04</date><risdate>2023</risdate><abstract>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</abstract><doi>10.48550/arxiv.2310.02815</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.02815
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_02815
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
title	CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T10%3A12%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoBEV:%20Elevating%20Roadside%203D%20Object%20Detection%20with%20Depth%20and%20Height%20Complementarity&rft.au=Shi,%20Hao&rft.date=2023-10-04&rft_id=info:doi/10.48550/arxiv.2310.02815&rft_dat=%3Carxiv_GOX%3E2310_02815%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true