ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2024-02, Vol.83 (6), p.18281-18307
Hauptverfasser: Hu, Yaocong, Shuai, Zhen, Yang, Huicheng, Wan, Guoyang, Zhang, Yajun, Xie, Chao, Lu, Mingqi, Lu, Xiaobo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 18307
container_issue 6
container_start_page 18281
container_title Multimedia tools and applications
container_volume 83
creator Hu, Yaocong
Shuai, Zhen
Yang, Huicheng
Wan, Guoyang
Zhang, Yajun
Xie, Chao
Lu, Mingqi
Lu, Xiaobo
description Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.
doi_str_mv 10.1007/s11042-023-15777-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2921398796</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2921398796</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2921398796</pqid></control><display><type>article</type><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><source>Springer Nature - Complete Springer Journals</source><creator>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creator><creatorcontrib>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creatorcontrib><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-15777-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Activity recognition ; Computational efficiency ; Computer Communication Networks ; Computer Science ; Computing costs ; Data Structures and Information Theory ; Deep learning ; Embedded systems ; Modules ; Multimedia Information Systems ; Real time ; Special Purpose and Application-Based Systems ; Track 6: Computer Vision for Multimedia Applications</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</cites><orcidid>0000-0002-5996-503X ; 0000-0002-7707-7538</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-15777-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-15777-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><subject>Accuracy</subject><subject>Activity recognition</subject><subject>Computational efficiency</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computing costs</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Embedded systems</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Real time</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Track 6: Computer Vision for Multimedia Applications</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Hu, Yaocong</creator><creator>Shuai, Zhen</creator><creator>Yang, Huicheng</creator><creator>Wan, Guoyang</creator><creator>Zhang, Yajun</creator><creator>Xie, Chao</creator><creator>Lu, Mingqi</creator><creator>Lu, Xiaobo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></search><sort><creationdate>20240201</creationdate><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><author>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Activity recognition</topic><topic>Computational efficiency</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computing costs</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Embedded systems</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Real time</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Track 6: Computer Vision for Multimedia Applications</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Yaocong</au><au>Shuai, Zhen</au><au>Yang, Huicheng</au><au>Wan, Guoyang</au><au>Zhang, Yajun</au><au>Xie, Chao</au><au>Lu, Mingqi</au><au>Lu, Xiaobo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>6</issue><spage>18281</spage><epage>18307</epage><pages>18281-18307</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-15777-0</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1573-7721
ispartof Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307
issn 1573-7721
1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_2921398796
source Springer Nature - Complete Springer Journals
subjects Accuracy
Activity recognition
Computational efficiency
Computer Communication Networks
Computer Science
Computing costs
Data Structures and Information Theory
Deep learning
Embedded systems
Modules
Multimedia Information Systems
Real time
Special Purpose and Application-Based Systems
Track 6: Computer Vision for Multimedia Applications
title ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T10%3A50%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESDAR-net:%20towards%20high-accuracy%20and%20real-time%20driver%20action%20recognition%20for%20embedded%20systems&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Hu,%20Yaocong&rft.date=2024-02-01&rft.volume=83&rft.issue=6&rft.spage=18281&rft.epage=18307&rft.pages=18281-18307&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-15777-0&rft_dat=%3Cproquest_cross%3E2921398796%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2921398796&rft_id=info:pmid/&rfr_iscdi=true