ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024-02, Vol.83 (6), p.18281-18307
Hauptverfasser:	Hu, Yaocong, Shuai, Zhen, Yang, Huicheng, Wan, Guoyang, Zhang, Yajun, Xie, Chao, Lu, Mingqi, Lu, Xiaobo
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Activity recognition Computational efficiency Computer Communication Networks Computer Science Computing costs Data Structures and Information Theory Deep learning Embedded systems Modules Multimedia Information Systems Real time Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	18307
container_issue	6
container_start_page	18281
container_title	Multimedia tools and applications
container_volume	83
creator	Hu, Yaocong Shuai, Zhen Yang, Huicheng Wan, Guoyang Zhang, Yajun Xie, Chao Lu, Mingqi Lu, Xiaobo
description	Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.
doi_str_mv	10.1007/s11042-023-15777-0
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2921398796</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2921398796</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2921398796</pqid></control><display><type>article</type><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><source>Springer Nature - Complete Springer Journals</source><creator>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creator><creatorcontrib>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creatorcontrib><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-15777-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Activity recognition ; Computational efficiency ; Computer Communication Networks ; Computer Science ; Computing costs ; Data Structures and Information Theory ; Deep learning ; Embedded systems ; Modules ; Multimedia Information Systems ; Real time ; Special Purpose and Application-Based Systems ; Track 6: Computer Vision for Multimedia Applications</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</cites><orcidid>0000-0002-5996-503X ; 0000-0002-7707-7538</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-15777-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-15777-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><subject>Accuracy</subject><subject>Activity recognition</subject><subject>Computational efficiency</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computing costs</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Embedded systems</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Real time</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Track 6: Computer Vision for Multimedia Applications</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Hu, Yaocong</creator><creator>Shuai, Zhen</creator><creator>Yang, Huicheng</creator><creator>Wan, Guoyang</creator><creator>Zhang, Yajun</creator><creator>Xie, Chao</creator><creator>Lu, Mingqi</creator><creator>Lu, Xiaobo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></search><sort><creationdate>20240201</creationdate><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><author>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Activity recognition</topic><topic>Computational efficiency</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computing costs</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Embedded systems</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Real time</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Track 6: Computer Vision for Multimedia Applications</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Yaocong</au><au>Shuai, Zhen</au><au>Yang, Huicheng</au><au>Wan, Guoyang</au><au>Zhang, Yajun</au><au>Xie, Chao</au><au>Lu, Mingqi</au><au>Lu, Xiaobo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>6</issue><spage>18281</spage><epage>18307</epage><pages>18281-18307</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-15777-0</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1573-7721
ispartof	Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307
issn	1573-7721 1380-7501 1573-7721
language	eng
recordid	cdi_proquest_journals_2921398796
source	Springer Nature - Complete Springer Journals
subjects	Accuracy Activity recognition Computational efficiency Computer Communication Networks Computer Science Computing costs Data Structures and Information Theory Deep learning Embedded systems Modules Multimedia Information Systems Real time Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications
title	ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T10%3A50%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESDAR-net:%20towards%20high-accuracy%20and%20real-time%20driver%20action%20recognition%20for%20embedded%20systems&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Hu,%20Yaocong&rft.date=2024-02-01&rft.volume=83&rft.issue=6&rft.spage=18281&rft.epage=18307&rft.pages=18281-18307&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-15777-0&rft_dat=%3Cproquest_cross%3E2921398796%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2921398796&rft_id=info:pmid/&rfr_iscdi=true