CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we fin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shi, Hao, Pang, Chengshan, Zhang, Jiaming, Yang, Kailun, Wu, Yuhao, Ni, Huajian, Lin, Yining, Stiefelhagen, Rainer, Wang, Kaiwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Shi, Hao
Pang, Chengshan
Zhang, Jiaming
Yang, Kailun
Wu, Yuhao
Ni, Huajian
Lin, Yining
Stiefelhagen, Rainer
Wang, Kaiwei
description Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.
doi_str_mv 10.48550/arxiv.2310.02815
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_02815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_02815</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-43f2ba9c76ca48b036d71e18b3834fd463def0c20564ab258ca25c92a182473c3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIrtazsuO0gDrVSpElRso-tHWqO8lFqF_j2hsJmjmcVIh5A7zubSKMUecPyOp7mAaWDCcHVN3ov-ufx4pGUTTphit6dvPfpj9IHCkm7tZ3CJLkOaEPuOfsV0mOowJXaerkLcHxIt-nZoQhu6hGNM5xtyVWNzDLf_nJHdS7krVtlm-7ounjYZ6lxlEmphceFy7VAay0D7nAduLBiQtZcafKiZE0xpiVYo41AotxDIjZA5OJiR-7_bi1U1jLHF8Vz92lUXO_gBT1hJRQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><source>arXiv.org</source><creator>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</creator><creatorcontrib>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</creatorcontrib><description>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</description><identifier>DOI: 10.48550/arxiv.2310.02815</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Robotics</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.02815$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.02815$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shi, Hao</creatorcontrib><creatorcontrib>Pang, Chengshan</creatorcontrib><creatorcontrib>Zhang, Jiaming</creatorcontrib><creatorcontrib>Yang, Kailun</creatorcontrib><creatorcontrib>Wu, Yuhao</creatorcontrib><creatorcontrib>Ni, Huajian</creatorcontrib><creatorcontrib>Lin, Yining</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><creatorcontrib>Wang, Kaiwei</creatorcontrib><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><description>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIrtazsuO0gDrVSpElRso-tHWqO8lFqF_j2hsJmjmcVIh5A7zubSKMUecPyOp7mAaWDCcHVN3ov-ufx4pGUTTphit6dvPfpj9IHCkm7tZ3CJLkOaEPuOfsV0mOowJXaerkLcHxIt-nZoQhu6hGNM5xtyVWNzDLf_nJHdS7krVtlm-7ounjYZ6lxlEmphceFy7VAay0D7nAduLBiQtZcafKiZE0xpiVYo41AotxDIjZA5OJiR-7_bi1U1jLHF8Vz92lUXO_gBT1hJRQ</recordid><startdate>20231004</startdate><enddate>20231004</enddate><creator>Shi, Hao</creator><creator>Pang, Chengshan</creator><creator>Zhang, Jiaming</creator><creator>Yang, Kailun</creator><creator>Wu, Yuhao</creator><creator>Ni, Huajian</creator><creator>Lin, Yining</creator><creator>Stiefelhagen, Rainer</creator><creator>Wang, Kaiwei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231004</creationdate><title>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</title><author>Shi, Hao ; Pang, Chengshan ; Zhang, Jiaming ; Yang, Kailun ; Wu, Yuhao ; Ni, Huajian ; Lin, Yining ; Stiefelhagen, Rainer ; Wang, Kaiwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-43f2ba9c76ca48b036d71e18b3834fd463def0c20564ab258ca25c92a182473c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Shi, Hao</creatorcontrib><creatorcontrib>Pang, Chengshan</creatorcontrib><creatorcontrib>Zhang, Jiaming</creatorcontrib><creatorcontrib>Yang, Kailun</creatorcontrib><creatorcontrib>Wu, Yuhao</creatorcontrib><creatorcontrib>Ni, Huajian</creatorcontrib><creatorcontrib>Lin, Yining</creatorcontrib><creatorcontrib>Stiefelhagen, Rainer</creatorcontrib><creatorcontrib>Wang, Kaiwei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shi, Hao</au><au>Pang, Chengshan</au><au>Zhang, Jiaming</au><au>Yang, Kailun</au><au>Wu, Yuhao</au><au>Ni, Huajian</au><au>Lin, Yining</au><au>Stiefelhagen, Rainer</au><au>Wang, Kaiwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity</atitle><date>2023-10-04</date><risdate>2023</risdate><abstract>Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.</abstract><doi>10.48550/arxiv.2310.02815</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2310.02815
ispartof
issn
language eng
recordid cdi_arxiv_primary_2310_02815
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
Computer Science - Robotics
title CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T10%3A12%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoBEV:%20Elevating%20Roadside%203D%20Object%20Detection%20with%20Depth%20and%20Height%20Complementarity&rft.au=Shi,%20Hao&rft.date=2023-10-04&rft_id=info:doi/10.48550/arxiv.2310.02815&rft_dat=%3Carxiv_GOX%3E2310_02815%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true