Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network

Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehens...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2024-05, Vol.25 (5), p.3615-3627
Hauptverfasser:	Liu, Chunsheng, Zhang, Xiao, Chang, Faliang, Li, Shuang, Hao, Penghui, Lu, Yansha, Wang, Yinhai
Format:	Artikel
Sprache:	eng
Schlagworte:	Advanced driver assistance systems Annotations attention mechanism Attention mechanisms Behavioral sciences Cameras Datasets Decoding Encoders-Decoders Feature extraction guidance captioning Semantics Traffic control Traffic scenario understanding video captioning Visual analytics Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3627
container_issue	5
container_start_page	3615
container_title	IEEE transactions on intelligent transportation systems
container_volume	25
creator	Liu, Chunsheng Zhang, Xiao Chang, Faliang Li, Shuang Hao, Penghui Lu, Yansha Wang, Yinhai
description	Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively.
doi_str_mv	10.1109/TITS.2023.3323085
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3055167345</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10323234</ieee_id><sourcerecordid>3055167345</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-efac0925d366f962add733a9e3ed459802e37c36b5842a04e15a2b5cb96c79c13</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKs_QPCw4Hlrkkmym2MpWgtFD916k5Ams5Kq2ZrdKv57d2kPnmZ4P2bgIeSa0QljVN9Vi2o14ZTDBIADLeUJGTEpy5xSpk6HnYtcU0nPyUXbbntVSMZG5LVKtq6Dy1YOo02hydbRY2o7G32Ib1k_spfgsclmdteFJg7id7DZfB-8jQ6zaddhHJz_iSfsfpr0fknOavvR4tVxjsn64b6aPebL5_liNl3mjgvV5VhbRzWXHpSqteLW-wLAagT0QuqScoTCgdrIUnBLBTJp-Ua6jVau0I7BmNwe7u5S87XHtjPbZp9i_9IAlZKpAoTsU-yQcqlp24S12aXwadOvYdQMFM1A0QwUzZFi37k5dAIi_sv3NgcBfzW4bpI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055167345</pqid></control><display><type>article</type><title>Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Chunsheng ; Zhang, Xiao ; Chang, Faliang ; Li, Shuang ; Hao, Penghui ; Lu, Yansha ; Wang, Yinhai</creator><creatorcontrib>Liu, Chunsheng ; Zhang, Xiao ; Chang, Faliang ; Li, Shuang ; Hao, Penghui ; Lu, Yansha ; Wang, Yinhai</creatorcontrib><description>Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively.</description><identifier>ISSN: 1524-9050</identifier><identifier>EISSN: 1558-0016</identifier><identifier>DOI: 10.1109/TITS.2023.3323085</identifier><identifier>CODEN: ITISFG</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Advanced driver assistance systems ; Annotations ; attention mechanism ; Attention mechanisms ; Behavioral sciences ; Cameras ; Datasets ; Decoding ; Encoders-Decoders ; Feature extraction ; guidance captioning ; Semantics ; Traffic control ; Traffic scenario understanding ; video captioning ; Visual analytics ; Visualization</subject><ispartof>IEEE transactions on intelligent transportation systems, 2024-05, Vol.25 (5), p.3615-3627</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-efac0925d366f962add733a9e3ed459802e37c36b5842a04e15a2b5cb96c79c13</cites><orcidid>0000-0001-5516-2486 ; 0000-0001-5990-2533 ; 0000-0003-4215-2195 ; 0000-0002-4180-5628 ; 0000-0003-1276-2267</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10323234$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10323234$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Chunsheng</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Chang, Faliang</creatorcontrib><creatorcontrib>Li, Shuang</creatorcontrib><creatorcontrib>Hao, Penghui</creatorcontrib><creatorcontrib>Lu, Yansha</creatorcontrib><creatorcontrib>Wang, Yinhai</creatorcontrib><title>Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network</title><title>IEEE transactions on intelligent transportation systems</title><addtitle>TITS</addtitle><description>Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively.</description><subject>Advanced driver assistance systems</subject><subject>Annotations</subject><subject>attention mechanism</subject><subject>Attention mechanisms</subject><subject>Behavioral sciences</subject><subject>Cameras</subject><subject>Datasets</subject><subject>Decoding</subject><subject>Encoders-Decoders</subject><subject>Feature extraction</subject><subject>guidance captioning</subject><subject>Semantics</subject><subject>Traffic control</subject><subject>Traffic scenario understanding</subject><subject>video captioning</subject><subject>Visual analytics</subject><subject>Visualization</subject><issn>1524-9050</issn><issn>1558-0016</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWKs_QPCw4Hlrkkmym2MpWgtFD916k5Ams5Kq2ZrdKv57d2kPnmZ4P2bgIeSa0QljVN9Vi2o14ZTDBIADLeUJGTEpy5xSpk6HnYtcU0nPyUXbbntVSMZG5LVKtq6Dy1YOo02hydbRY2o7G32Ib1k_spfgsclmdteFJg7id7DZfB-8jQ6zaddhHJz_iSfsfpr0fknOavvR4tVxjsn64b6aPebL5_liNl3mjgvV5VhbRzWXHpSqteLW-wLAagT0QuqScoTCgdrIUnBLBTJp-Ua6jVau0I7BmNwe7u5S87XHtjPbZp9i_9IAlZKpAoTsU-yQcqlp24S12aXwadOvYdQMFM1A0QwUzZFi37k5dAIi_sv3NgcBfzW4bpI</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Liu, Chunsheng</creator><creator>Zhang, Xiao</creator><creator>Chang, Faliang</creator><creator>Li, Shuang</creator><creator>Hao, Penghui</creator><creator>Lu, Yansha</creator><creator>Wang, Yinhai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5516-2486</orcidid><orcidid>https://orcid.org/0000-0001-5990-2533</orcidid><orcidid>https://orcid.org/0000-0003-4215-2195</orcidid><orcidid>https://orcid.org/0000-0002-4180-5628</orcidid><orcidid>https://orcid.org/0000-0003-1276-2267</orcidid></search><sort><creationdate>20240501</creationdate><title>Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network</title><author>Liu, Chunsheng ; Zhang, Xiao ; Chang, Faliang ; Li, Shuang ; Hao, Penghui ; Lu, Yansha ; Wang, Yinhai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-efac0925d366f962add733a9e3ed459802e37c36b5842a04e15a2b5cb96c79c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced driver assistance systems</topic><topic>Annotations</topic><topic>attention mechanism</topic><topic>Attention mechanisms</topic><topic>Behavioral sciences</topic><topic>Cameras</topic><topic>Datasets</topic><topic>Decoding</topic><topic>Encoders-Decoders</topic><topic>Feature extraction</topic><topic>guidance captioning</topic><topic>Semantics</topic><topic>Traffic control</topic><topic>Traffic scenario understanding</topic><topic>video captioning</topic><topic>Visual analytics</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Chunsheng</creatorcontrib><creatorcontrib>Zhang, Xiao</creatorcontrib><creatorcontrib>Chang, Faliang</creatorcontrib><creatorcontrib>Li, Shuang</creatorcontrib><creatorcontrib>Hao, Penghui</creatorcontrib><creatorcontrib>Lu, Yansha</creatorcontrib><creatorcontrib>Wang, Yinhai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on intelligent transportation systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Chunsheng</au><au>Zhang, Xiao</au><au>Chang, Faliang</au><au>Li, Shuang</au><au>Hao, Penghui</au><au>Lu, Yansha</au><au>Wang, Yinhai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network</atitle><jtitle>IEEE transactions on intelligent transportation systems</jtitle><stitle>TITS</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>25</volume><issue>5</issue><spage>3615</spage><epage>3627</epage><pages>3615-3627</pages><issn>1524-9050</issn><eissn>1558-0016</eissn><coden>ITISFG</coden><abstract>Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TITS.2023.3323085</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-5516-2486</orcidid><orcidid>https://orcid.org/0000-0001-5990-2533</orcidid><orcidid>https://orcid.org/0000-0003-4215-2195</orcidid><orcidid>https://orcid.org/0000-0002-4180-5628</orcidid><orcidid>https://orcid.org/0000-0003-1276-2267</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1524-9050
ispartof	IEEE transactions on intelligent transportation systems, 2024-05, Vol.25 (5), p.3615-3627
issn	1524-9050 1558-0016
language	eng
recordid	cdi_proquest_journals_3055167345
source	IEEE Electronic Library (IEL)
subjects	Advanced driver assistance systems Annotations attention mechanism Attention mechanisms Behavioral sciences Cameras Datasets Decoding Encoders-Decoders Feature extraction guidance captioning Semantics Traffic control Traffic scenario understanding video captioning Visual analytics Visualization
title	Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T16%3A49%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Traffic%20Scenario%20Understanding%20and%20Video%20Captioning%20via%20Guidance%20Attention%20Captioning%20Network&rft.jtitle=IEEE%20transactions%20on%20intelligent%20transportation%20systems&rft.au=Liu,%20Chunsheng&rft.date=2024-05-01&rft.volume=25&rft.issue=5&rft.spage=3615&rft.epage=3627&rft.pages=3615-3627&rft.issn=1524-9050&rft.eissn=1558-0016&rft.coden=ITISFG&rft_id=info:doi/10.1109/TITS.2023.3323085&rft_dat=%3Cproquest_RIE%3E3055167345%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055167345&rft_id=info:pmid/&rft_ieee_id=10323234&rfr_iscdi=true