Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehens...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2024-05, Vol.25 (5), p.3615-3627 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Describing a traffic scenario from the driver's perspective is a challenging process for Advanced Driving Assistance System (ADAS), involving different sub-tasks of detection, tracking, segmentation, etc. Previous methods mainly focus on independent sub-tasks and have difficulties to comprehensively describe the incidents. In this study, this problem is novelly treated as a video captioning task, and a Guidance Attention Captioning Network (GAC-Network) structure is proposed for describing the incidents in a concise single sentence. In GAC-Network, an Attention based Encoder-Decoder Net (AED-Net) is built as the main network; with the temporal spatial attention mechanisms, the AED-Net make it possible to effectively reject the unimportant traffic behaviors and redundant backgrounds. Considering various driving scenarios, the Spatio-Temporal Layer Normalization is used to improve the generalization ability. To generate captions for incidents in driving, the novel Guidance Module is proposed to boost the encoder-decoder model to generate words in a caption, which have better relationship to the past and future words. Because there is no public dataset for captioning of driving scenarios, the Traffic Video Captioning (TVC) dataset is released for the video captioning task in driving scenarios. Experimental results show that the proposed methods can fulfill the captioning task for complex driving scenarios, and achieve higher performance than the methods for comparison, including at least 2.5%, 1.8%, 3.6%, and 13.1% better results on BLEU_1, METEOR, ROUGE_L and CIDEr, respectively. |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2023.3323085 |