Cut-in maneuver detection with self-supervised contrastive video representation learning

The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2023-09, Vol.17 (6), p.2915-2923
Hauptverfasser: Nalcakan, Yagiz, Bastanlar, Yalin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2923
container_issue 6
container_start_page 2915
container_title Signal, image and video processing
container_volume 17
creator Nalcakan, Yagiz
Bastanlar, Yalin
description The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by ∼ 2% with an accuracy of 92.52%.
doi_str_mv 10.1007/s11760-023-02512-3
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2826804718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2826804718</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FMspukRyl-FApeFLyF_ZitKW12TbIr_ntjV_TmwDAD877vwEPIJbBrYEzdBAAlGWVcpC6AU3FCZqCloKAATn93Js7JIoQdSyW40lLPyOtqiNS67FA6HEb0WYMR62g7l33Y-JYF3Lc0DD360QZssrpz0Zch2hGz0TbYZR57jwFdLI-uPZbeWbe9IGdtuQ-4-Jlz8nJ_97x6pJunh_XqdkNrrlikFSrN1LKopShqoZcql5UWRbrlvGU5a3KQLUhQqsBKNQqapqoZaCWQlzlrxZxcTbm9794HDNHsusG79NJwzaVmuQKdVHxS1b4LwWNrem8Ppf80wMw3RDNBNAmiOUI0IpnEZApJ7Lbo_6L_cX0Bx7Z0kA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2826804718</pqid></control><display><type>article</type><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><source>Springer Nature - Complete Springer Journals</source><creator>Nalcakan, Yagiz ; Bastanlar, Yalin</creator><creatorcontrib>Nalcakan, Yagiz ; Bastanlar, Yalin</creatorcontrib><description>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by ∼ 2% with an accuracy of 92.52%.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-023-02512-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Coders ; Computer Imaging ; Computer Science ; Datasets ; Image Processing and Computer Vision ; Learning ; Maneuvers ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Representations ; Signal,Image and Speech Processing ; Vehicles ; Video ; Vision</subject><ispartof>Signal, image and video processing, 2023-09, Vol.17 (6), p.2915-2923</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-023-02512-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-023-02512-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Nalcakan, Yagiz</creatorcontrib><creatorcontrib>Bastanlar, Yalin</creatorcontrib><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by ∼ 2% with an accuracy of 92.52%.</description><subject>Coders</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Image Processing and Computer Vision</subject><subject>Learning</subject><subject>Maneuvers</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><subject>Vehicles</subject><subject>Video</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FMspukRyl-FApeFLyF_ZitKW12TbIr_ntjV_TmwDAD877vwEPIJbBrYEzdBAAlGWVcpC6AU3FCZqCloKAATn93Js7JIoQdSyW40lLPyOtqiNS67FA6HEb0WYMR62g7l33Y-JYF3Lc0DD360QZssrpz0Zch2hGz0TbYZR57jwFdLI-uPZbeWbe9IGdtuQ-4-Jlz8nJ_97x6pJunh_XqdkNrrlikFSrN1LKopShqoZcql5UWRbrlvGU5a3KQLUhQqsBKNQqapqoZaCWQlzlrxZxcTbm9794HDNHsusG79NJwzaVmuQKdVHxS1b4LwWNrem8Ppf80wMw3RDNBNAmiOUI0IpnEZApJ7Lbo_6L_cX0Bx7Z0kA</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Nalcakan, Yagiz</creator><creator>Bastanlar, Yalin</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230901</creationdate><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><author>Nalcakan, Yagiz ; Bastanlar, Yalin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Image Processing and Computer Vision</topic><topic>Learning</topic><topic>Maneuvers</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><topic>Vehicles</topic><topic>Video</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nalcakan, Yagiz</creatorcontrib><creatorcontrib>Bastanlar, Yalin</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nalcakan, Yagiz</au><au>Bastanlar, Yalin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cut-in maneuver detection with self-supervised contrastive video representation learning</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2023-09-01</date><risdate>2023</risdate><volume>17</volume><issue>6</issue><spage>2915</spage><epage>2923</epage><pages>2915-2923</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by ∼ 2% with an accuracy of 92.52%.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-023-02512-3</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2023-09, Vol.17 (6), p.2915-2923
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_2826804718
source Springer Nature - Complete Springer Journals
subjects Coders
Computer Imaging
Computer Science
Datasets
Image Processing and Computer Vision
Learning
Maneuvers
Multimedia Information Systems
Original Paper
Pattern Recognition and Graphics
Representations
Signal,Image and Speech Processing
Vehicles
Video
Vision
title Cut-in maneuver detection with self-supervised contrastive video representation learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A33%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cut-in%20maneuver%20detection%20with%20self-supervised%20contrastive%20video%20representation%20learning&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Nalcakan,%20Yagiz&rft.date=2023-09-01&rft.volume=17&rft.issue=6&rft.spage=2915&rft.epage=2923&rft.pages=2915-2923&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-023-02512-3&rft_dat=%3Cproquest_cross%3E2826804718%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2826804718&rft_id=info:pmid/&rfr_iscdi=true