MVP: Robust Multi-View Practice for Driving Action Localization

Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shang, Jingjie, Li, Kunchang, Tian, Kaibin, Su, Haisheng, Li, Yangguang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Shang, Jingjie
Li, Kunchang
Tian, Kaibin
Su, Haisheng
Li, Yangguang
description Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.
doi_str_mv 10.48550/arxiv.2207.02042
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2207_02042</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2207_02042</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-17d9df64b6d7562c6aa879becd64a38f4abe1090446be81c825257486618bcc3</originalsourceid><addsrcrecordid>eNotz7FuwjAYBGAvHSroA3TCL5DUduzfDgtCtKVIQSBArNFvx6ksBVKZQGmfvkCZTnfDSR8hz5yl0ijFXjCewykVgumUCSbFIxnNt8shXbX2eOjo_Nh0IdkG_02XEV0XnKd1G-lrDKew_6Tjy9TuadE6bMIvXkufPNTYHPzTPXtk_f62mXwkxWI6m4yLBEGLhOsqr2qQFiqtQDhANDq33lUgMTO1ROs5y5mUYL3hzggllJYGgBvrXNYjg__XG6D8imGH8ae8QsobJPsD9f5CRQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MVP: Robust Multi-View Practice for Driving Action Localization</title><source>arXiv.org</source><creator>Shang, Jingjie ; Li, Kunchang ; Tian, Kaibin ; Su, Haisheng ; Li, Yangguang</creator><creatorcontrib>Shang, Jingjie ; Li, Kunchang ; Tian, Kaibin ; Su, Haisheng ; Li, Yangguang</creatorcontrib><description>Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.</description><identifier>DOI: 10.48550/arxiv.2207.02042</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2207.02042$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2207.02042$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shang, Jingjie</creatorcontrib><creatorcontrib>Li, Kunchang</creatorcontrib><creatorcontrib>Tian, Kaibin</creatorcontrib><creatorcontrib>Su, Haisheng</creatorcontrib><creatorcontrib>Li, Yangguang</creatorcontrib><title>MVP: Robust Multi-View Practice for Driving Action Localization</title><description>Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FuwjAYBGAvHSroA3TCL5DUduzfDgtCtKVIQSBArNFvx6ksBVKZQGmfvkCZTnfDSR8hz5yl0ijFXjCewykVgumUCSbFIxnNt8shXbX2eOjo_Nh0IdkG_02XEV0XnKd1G-lrDKew_6Tjy9TuadE6bMIvXkufPNTYHPzTPXtk_f62mXwkxWI6m4yLBEGLhOsqr2qQFiqtQDhANDq33lUgMTO1ROs5y5mUYL3hzggllJYGgBvrXNYjg__XG6D8imGH8ae8QsobJPsD9f5CRQ</recordid><startdate>20220705</startdate><enddate>20220705</enddate><creator>Shang, Jingjie</creator><creator>Li, Kunchang</creator><creator>Tian, Kaibin</creator><creator>Su, Haisheng</creator><creator>Li, Yangguang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220705</creationdate><title>MVP: Robust Multi-View Practice for Driving Action Localization</title><author>Shang, Jingjie ; Li, Kunchang ; Tian, Kaibin ; Su, Haisheng ; Li, Yangguang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-17d9df64b6d7562c6aa879becd64a38f4abe1090446be81c825257486618bcc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Shang, Jingjie</creatorcontrib><creatorcontrib>Li, Kunchang</creatorcontrib><creatorcontrib>Tian, Kaibin</creatorcontrib><creatorcontrib>Su, Haisheng</creatorcontrib><creatorcontrib>Li, Yangguang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shang, Jingjie</au><au>Li, Kunchang</au><au>Tian, Kaibin</au><au>Su, Haisheng</au><au>Li, Yangguang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MVP: Robust Multi-View Practice for Driving Action Localization</atitle><date>2022-07-05</date><risdate>2022</risdate><abstract>Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.</abstract><doi>10.48550/arxiv.2207.02042</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2207.02042
ispartof
issn
language eng
recordid cdi_arxiv_primary_2207_02042
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title MVP: Robust Multi-View Practice for Driving Action Localization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T13%3A03%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MVP:%20Robust%20Multi-View%20Practice%20for%20Driving%20Action%20Localization&rft.au=Shang,%20Jingjie&rft.date=2022-07-05&rft_id=info:doi/10.48550/arxiv.2207.02042&rft_dat=%3Carxiv_GOX%3E2207_02042%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true