Cut-in maneuver detection with self-supervised contrastive video representation learning
The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of th...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2023-09, Vol.17 (6), p.2915-2923 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2923 |
---|---|
container_issue | 6 |
container_start_page | 2915 |
container_title | Signal, image and video processing |
container_volume | 17 |
creator | Nalcakan, Yagiz Bastanlar, Yalin |
description | The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by
∼
2% with an accuracy of 92.52%. |
doi_str_mv | 10.1007/s11760-023-02512-3 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2826804718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2826804718</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FMspukRyl-FApeFLyF_ZitKW12TbIr_ntjV_TmwDAD877vwEPIJbBrYEzdBAAlGWVcpC6AU3FCZqCloKAATn93Js7JIoQdSyW40lLPyOtqiNS67FA6HEb0WYMR62g7l33Y-JYF3Lc0DD360QZssrpz0Zch2hGz0TbYZR57jwFdLI-uPZbeWbe9IGdtuQ-4-Jlz8nJ_97x6pJunh_XqdkNrrlikFSrN1LKopShqoZcql5UWRbrlvGU5a3KQLUhQqsBKNQqapqoZaCWQlzlrxZxcTbm9794HDNHsusG79NJwzaVmuQKdVHxS1b4LwWNrem8Ppf80wMw3RDNBNAmiOUI0IpnEZApJ7Lbo_6L_cX0Bx7Z0kA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2826804718</pqid></control><display><type>article</type><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><source>Springer Nature - Complete Springer Journals</source><creator>Nalcakan, Yagiz ; Bastanlar, Yalin</creator><creatorcontrib>Nalcakan, Yagiz ; Bastanlar, Yalin</creatorcontrib><description>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by
∼
2% with an accuracy of 92.52%.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-023-02512-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Coders ; Computer Imaging ; Computer Science ; Datasets ; Image Processing and Computer Vision ; Learning ; Maneuvers ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Representations ; Signal,Image and Speech Processing ; Vehicles ; Video ; Vision</subject><ispartof>Signal, image and video processing, 2023-09, Vol.17 (6), p.2915-2923</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-023-02512-3$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-023-02512-3$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Nalcakan, Yagiz</creatorcontrib><creatorcontrib>Bastanlar, Yalin</creatorcontrib><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by
∼
2% with an accuracy of 92.52%.</description><subject>Coders</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Image Processing and Computer Vision</subject><subject>Learning</subject><subject>Maneuvers</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><subject>Vehicles</subject><subject>Video</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWGr_gKcFz9FMspukRyl-FApeFLyF_ZitKW12TbIr_ntjV_TmwDAD877vwEPIJbBrYEzdBAAlGWVcpC6AU3FCZqCloKAATn93Js7JIoQdSyW40lLPyOtqiNS67FA6HEb0WYMR62g7l33Y-JYF3Lc0DD360QZssrpz0Zch2hGz0TbYZR57jwFdLI-uPZbeWbe9IGdtuQ-4-Jlz8nJ_97x6pJunh_XqdkNrrlikFSrN1LKopShqoZcql5UWRbrlvGU5a3KQLUhQqsBKNQqapqoZaCWQlzlrxZxcTbm9794HDNHsusG79NJwzaVmuQKdVHxS1b4LwWNrem8Ppf80wMw3RDNBNAmiOUI0IpnEZApJ7Lbo_6L_cX0Bx7Z0kA</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Nalcakan, Yagiz</creator><creator>Bastanlar, Yalin</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230901</creationdate><title>Cut-in maneuver detection with self-supervised contrastive video representation learning</title><author>Nalcakan, Yagiz ; Bastanlar, Yalin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-be780795c635c389746b83527042f040d416f161775eb7d71ddbc01873e2a40f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Coders</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Image Processing and Computer Vision</topic><topic>Learning</topic><topic>Maneuvers</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><topic>Vehicles</topic><topic>Video</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nalcakan, Yagiz</creatorcontrib><creatorcontrib>Bastanlar, Yalin</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nalcakan, Yagiz</au><au>Bastanlar, Yalin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cut-in maneuver detection with self-supervised contrastive video representation learning</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2023-09-01</date><risdate>2023</risdate><volume>17</volume><issue>6</issue><spage>2915</spage><epage>2923</epage><pages>2915-2923</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by
∼
2% with an accuracy of 92.52%.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-023-02512-3</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1863-1703 |
ispartof | Signal, image and video processing, 2023-09, Vol.17 (6), p.2915-2923 |
issn | 1863-1703 1863-1711 |
language | eng |
recordid | cdi_proquest_journals_2826804718 |
source | Springer Nature - Complete Springer Journals |
subjects | Coders Computer Imaging Computer Science Datasets Image Processing and Computer Vision Learning Maneuvers Multimedia Information Systems Original Paper Pattern Recognition and Graphics Representations Signal,Image and Speech Processing Vehicles Video Vision |
title | Cut-in maneuver detection with self-supervised contrastive video representation learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A33%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cut-in%20maneuver%20detection%20with%20self-supervised%20contrastive%20video%20representation%20learning&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Nalcakan,%20Yagiz&rft.date=2023-09-01&rft.volume=17&rft.issue=6&rft.spage=2915&rft.epage=2923&rft.pages=2915-2923&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-023-02512-3&rft_dat=%3Cproquest_cross%3E2826804718%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2826804718&rft_id=info:pmid/&rfr_iscdi=true |