ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems
Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehic...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024-02, Vol.83 (6), p.18281-18307 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 18307 |
---|---|
container_issue | 6 |
container_start_page | 18281 |
container_title | Multimedia tools and applications |
container_volume | 83 |
creator | Hu, Yaocong Shuai, Zhen Yang, Huicheng Wan, Guoyang Zhang, Yajun Xie, Chao Lu, Mingqi Lu, Xiaobo |
description | Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2. |
doi_str_mv | 10.1007/s11042-023-15777-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2921398796</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2921398796</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2921398796</pqid></control><display><type>article</type><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><source>Springer Nature - Complete Springer Journals</source><creator>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creator><creatorcontrib>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</creatorcontrib><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-15777-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Activity recognition ; Computational efficiency ; Computer Communication Networks ; Computer Science ; Computing costs ; Data Structures and Information Theory ; Deep learning ; Embedded systems ; Modules ; Multimedia Information Systems ; Real time ; Special Purpose and Application-Based Systems ; Track 6: Computer Vision for Multimedia Applications</subject><ispartof>Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</cites><orcidid>0000-0002-5996-503X ; 0000-0002-7707-7538</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-023-15777-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-023-15777-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</description><subject>Accuracy</subject><subject>Activity recognition</subject><subject>Computational efficiency</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computing costs</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Embedded systems</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Real time</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Track 6: Computer Vision for Multimedia Applications</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhYMoWKt_wFXAdTSPmUnjrtT6gILgYynhZnKnndLO1GSq9N8bO4KuXN3D4Zxz4SPkXPBLwbm-ikLwTDIuFRO51prxAzJISjGtpTj8o4_JSYxLzkWRy2xA3qbPN-Mn1mB3Tbv2E4KPdFHPFwzKchug3FFoPA0IK9bVa6Q-1B8YKJRd3TbJL9t5U-911QaKa4feo6dxFztcx1NyVMEq4tnPHZLX2-nL5J7NHu8eJuMZK6XmHVOYaSeM0xKcElnlZIHKAHAUzqAzfuRGunCFz3UFwqjCZ5ABT5aHXCqlhuSi392E9n2LsbPLdhua9NJKI4UyI22KlJJ9qgxtjAEruwn1GsLOCm6_Mdoeo00Y7R6j5amk-lJM4WaO4Xf6n9YXp091sQ</recordid><startdate>20240201</startdate><enddate>20240201</enddate><creator>Hu, Yaocong</creator><creator>Shuai, Zhen</creator><creator>Yang, Huicheng</creator><creator>Wan, Guoyang</creator><creator>Zhang, Yajun</creator><creator>Xie, Chao</creator><creator>Lu, Mingqi</creator><creator>Lu, Xiaobo</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></search><sort><creationdate>20240201</creationdate><title>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</title><author>Hu, Yaocong ; Shuai, Zhen ; Yang, Huicheng ; Wan, Guoyang ; Zhang, Yajun ; Xie, Chao ; Lu, Mingqi ; Lu, Xiaobo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-3e47b19b72ab314fb26e39aa0e1b9eb9d8b876b6d57fa1936d4a4a076bda52333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Activity recognition</topic><topic>Computational efficiency</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computing costs</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Embedded systems</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Real time</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Track 6: Computer Vision for Multimedia Applications</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Yaocong</creatorcontrib><creatorcontrib>Shuai, Zhen</creatorcontrib><creatorcontrib>Yang, Huicheng</creatorcontrib><creatorcontrib>Wan, Guoyang</creatorcontrib><creatorcontrib>Zhang, Yajun</creatorcontrib><creatorcontrib>Xie, Chao</creatorcontrib><creatorcontrib>Lu, Mingqi</creatorcontrib><creatorcontrib>Lu, Xiaobo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Yaocong</au><au>Shuai, Zhen</au><au>Yang, Huicheng</au><au>Wan, Guoyang</au><au>Zhang, Yajun</au><au>Xie, Chao</au><au>Lu, Mingqi</au><au>Lu, Xiaobo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-02-01</date><risdate>2024</risdate><volume>83</volume><issue>6</issue><spage>18281</spage><epage>18307</epage><pages>18281-18307</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-15777-0</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0002-5996-503X</orcidid><orcidid>https://orcid.org/0000-0002-7707-7538</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1573-7721 |
ispartof | Multimedia tools and applications, 2024-02, Vol.83 (6), p.18281-18307 |
issn | 1573-7721 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2921398796 |
source | Springer Nature - Complete Springer Journals |
subjects | Accuracy Activity recognition Computational efficiency Computer Communication Networks Computer Science Computing costs Data Structures and Information Theory Deep learning Embedded systems Modules Multimedia Information Systems Real time Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications |
title | ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T10%3A50%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESDAR-net:%20towards%20high-accuracy%20and%20real-time%20driver%20action%20recognition%20for%20embedded%20systems&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Hu,%20Yaocong&rft.date=2024-02-01&rft.volume=83&rft.issue=6&rft.spage=18281&rft.epage=18307&rft.pages=18281-18307&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-15777-0&rft_dat=%3Cproquest_cross%3E2921398796%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2921398796&rft_id=info:pmid/&rfr_iscdi=true |