DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Xijun, Sandoval-Segura, Pedro, Zhang, Chengyuan, Huang, Junyun, Guan, Tianrui, Xian, Ruiqi, Liu, Fuxiao, Chandra, Rohan, Gong, Boqing, Manocha, Dinesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Xijun Sandoval-Segura, Pedro Zhang, Chengyuan Huang, Junyun Guan, Tianrui Xian, Ruiqi Liu, Fuxiao Chandra, Rohan Gong, Boqing Manocha, Dinesh
description	Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. DAVE densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within DAVE are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. DAVE can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in DAVE, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. DAVE provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
doi_str_mv	10.48550/arxiv.2412.20042
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_20042</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_20042</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_200423</originalsourceid><addsrcrecordid>eNqFjj0OgkAQhbexMOoBrJwLiICYGDsiGGqitGTUUSZZdsnuingDjy0Se6uXvJ-ZT4h54HvRdrPxV2g6br0wCkIv9P0oHIt3EhfpDhJuyViC2OmaL1CwfaCEVFJNyllI0KElB092FWR8ryCnxpDtQ3SsFegbFA-pyOBZEuQar3Cy_UVgBXtdN5I6QNWbqp9d-eKGXqpaNloNP6ZidENpafbTiVgc0uM-Ww7MZWO4RvMqv-zlwL7-3_gAfIFQxw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments</title><source>arXiv.org</source><creator>Wang, Xijun ; Sandoval-Segura, Pedro ; Zhang, Chengyuan ; Huang, Junyun ; Guan, Tianrui ; Xian, Ruiqi ; Liu, Fuxiao ; Chandra, Rohan ; Gong, Boqing ; Manocha, Dinesh</creator><creatorcontrib>Wang, Xijun ; Sandoval-Segura, Pedro ; Zhang, Chengyuan ; Huang, Junyun ; Guan, Tianrui ; Xian, Ruiqi ; Liu, Fuxiao ; Chandra, Rohan ; Gong, Boqing ; Manocha, Dinesh</creatorcontrib><description>Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. DAVE densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within DAVE are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. DAVE can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in DAVE, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. DAVE provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.</description><identifier>DOI: 10.48550/arxiv.2412.20042</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.20042$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.20042$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Xijun</creatorcontrib><creatorcontrib>Sandoval-Segura, Pedro</creatorcontrib><creatorcontrib>Zhang, Chengyuan</creatorcontrib><creatorcontrib>Huang, Junyun</creatorcontrib><creatorcontrib>Guan, Tianrui</creatorcontrib><creatorcontrib>Xian, Ruiqi</creatorcontrib><creatorcontrib>Liu, Fuxiao</creatorcontrib><creatorcontrib>Chandra, Rohan</creatorcontrib><creatorcontrib>Gong, Boqing</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><title>DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments</title><description>Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. DAVE densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within DAVE are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. DAVE can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in DAVE, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. DAVE provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjj0OgkAQhbexMOoBrJwLiICYGDsiGGqitGTUUSZZdsnuingDjy0Se6uXvJ-ZT4h54HvRdrPxV2g6br0wCkIv9P0oHIt3EhfpDhJuyViC2OmaL1CwfaCEVFJNyllI0KElB092FWR8ryCnxpDtQ3SsFegbFA-pyOBZEuQar3Cy_UVgBXtdN5I6QNWbqp9d-eKGXqpaNloNP6ZidENpafbTiVgc0uM-Ww7MZWO4RvMqv-zlwL7-3_gAfIFQxw</recordid><startdate>20241228</startdate><enddate>20241228</enddate><creator>Wang, Xijun</creator><creator>Sandoval-Segura, Pedro</creator><creator>Zhang, Chengyuan</creator><creator>Huang, Junyun</creator><creator>Guan, Tianrui</creator><creator>Xian, Ruiqi</creator><creator>Liu, Fuxiao</creator><creator>Chandra, Rohan</creator><creator>Gong, Boqing</creator><creator>Manocha, Dinesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241228</creationdate><title>DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments</title><author>Wang, Xijun ; Sandoval-Segura, Pedro ; Zhang, Chengyuan ; Huang, Junyun ; Guan, Tianrui ; Xian, Ruiqi ; Liu, Fuxiao ; Chandra, Rohan ; Gong, Boqing ; Manocha, Dinesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_200423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xijun</creatorcontrib><creatorcontrib>Sandoval-Segura, Pedro</creatorcontrib><creatorcontrib>Zhang, Chengyuan</creatorcontrib><creatorcontrib>Huang, Junyun</creatorcontrib><creatorcontrib>Guan, Tianrui</creatorcontrib><creatorcontrib>Xian, Ruiqi</creatorcontrib><creatorcontrib>Liu, Fuxiao</creatorcontrib><creatorcontrib>Chandra, Rohan</creatorcontrib><creatorcontrib>Gong, Boqing</creatorcontrib><creatorcontrib>Manocha, Dinesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Xijun</au><au>Sandoval-Segura, Pedro</au><au>Zhang, Chengyuan</au><au>Huang, Junyun</au><au>Guan, Tianrui</au><au>Xian, Ruiqi</au><au>Liu, Fuxiao</au><au>Chandra, Rohan</au><au>Gong, Boqing</au><au>Manocha, Dinesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments</atitle><date>2024-12-28</date><risdate>2024</risdate><abstract>Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. DAVE densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within DAVE are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. DAVE can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in DAVE, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. DAVE provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.</abstract><doi>10.48550/arxiv.2412.20042</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.20042
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_20042
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T21%3A30%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DAVE:%20Diverse%20Atomic%20Visual%20Elements%20Dataset%20with%20High%20Representation%20of%20Vulnerable%20Road%20Users%20in%20Complex%20and%20Unpredictable%20Environments&rft.au=Wang,%20Xijun&rft.date=2024-12-28&rft_id=info:doi/10.48550/arxiv.2412.20042&rft_dat=%3Carxiv_GOX%3E2412_20042%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true