Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Rece...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-09
Hauptverfasser:	Nguyen, Huy-Dung, Bairouk, Anass, Maras, Mirjana, Xiao, Wei, Wang, Tsun-Hsuan, Chareyre, Patrick, Hasani, Ramin, Blanchon, Marc, Rus, Daniela
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous navigation Coders Computer vision Deep learning Human error Human performance Perception Pose estimation Steering Task complexity Three dimensional flow Three dimensional motion Traffic congestion Traffic control Traffic safety Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Nguyen, Huy-Dung Bairouk, Anass Maras, Mirjana Xiao, Wei Wang, Tsun-Hsuan Chareyre, Patrick Hasani, Ramin Blanchon, Marc Rus, Daniela
description	Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. In this paper, we propose a shared encoder trained on multiple computer vision tasks critical for urban navigation, such as depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By incorporating diverse visual information used by humans during navigation, this unified encoder might enhance steering angle estimation. To achieve effective multi-task learning within a single encoder, we introduce a multi-scale feature network for pose estimation to improve depth learning. Additionally, we employ knowledge distillation from a multi-backbone model pretrained on these navigation tasks to stabilize training and boost performance. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities. While our performance in steering angle estimation is comparable to existing methods, the integration of human-like perception through multi-task learning holds significant potential for advancing autonomous driving systems. More details and the pretrained model are available at https://hi-computervision.github.io/uni-encoder/.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3106239828</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3106239828</sourcerecordid><originalsourceid>FETCH-proquest_journals_31062398283</originalsourceid><addsrcrecordid>eNqNjc0KwjAQhIMgKOo7LHgu1MSf6k20oqAgqGcJcaPrT1qzqRdf3lZ8AE8DM9_M1ERTKtWLkr6UDdFhvsZxLIcjORiopngvi4d2sHJM50tgmHt6oYO1DugC7HJtEGzmYU7Woq-8iiB3hi16ztGEkucJTOHgyBKeIHUmO6H_tlJryVDV2hT3QNFe8638-i4ZbIu61XfGzk9bortI97NllPvsWSCH4zUrvCujo-rFQ6nGiUzUf9QH7_NODQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106239828</pqid></control><display><type>article</type><title>Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference</title><source>Free E- Journals</source><creator>Nguyen, Huy-Dung ; Bairouk, Anass ; Maras, Mirjana ; Xiao, Wei ; Wang, Tsun-Hsuan ; Chareyre, Patrick ; Hasani, Ramin ; Blanchon, Marc ; Rus, Daniela</creator><creatorcontrib>Nguyen, Huy-Dung ; Bairouk, Anass ; Maras, Mirjana ; Xiao, Wei ; Wang, Tsun-Hsuan ; Chareyre, Patrick ; Hasani, Ramin ; Blanchon, Marc ; Rus, Daniela</creatorcontrib><description>Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. In this paper, we propose a shared encoder trained on multiple computer vision tasks critical for urban navigation, such as depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By incorporating diverse visual information used by humans during navigation, this unified encoder might enhance steering angle estimation. To achieve effective multi-task learning within a single encoder, we introduce a multi-scale feature network for pose estimation to improve depth learning. Additionally, we employ knowledge distillation from a multi-backbone model pretrained on these navigation tasks to stabilize training and boost performance. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities. While our performance in steering angle estimation is comparable to existing methods, the integration of human-like perception through multi-task learning holds significant potential for advancing autonomous driving systems. More details and the pretrained model are available at https://hi-computervision.github.io/uni-encoder/.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Autonomous navigation ; Coders ; Computer vision ; Deep learning ; Human error ; Human performance ; Perception ; Pose estimation ; Steering ; Task complexity ; Three dimensional flow ; Three dimensional motion ; Traffic congestion ; Traffic control ; Traffic safety ; Visual tasks</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Nguyen, Huy-Dung</creatorcontrib><creatorcontrib>Bairouk, Anass</creatorcontrib><creatorcontrib>Maras, Mirjana</creatorcontrib><creatorcontrib>Xiao, Wei</creatorcontrib><creatorcontrib>Wang, Tsun-Hsuan</creatorcontrib><creatorcontrib>Chareyre, Patrick</creatorcontrib><creatorcontrib>Hasani, Ramin</creatorcontrib><creatorcontrib>Blanchon, Marc</creatorcontrib><creatorcontrib>Rus, Daniela</creatorcontrib><title>Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference</title><title>arXiv.org</title><description>Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. In this paper, we propose a shared encoder trained on multiple computer vision tasks critical for urban navigation, such as depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By incorporating diverse visual information used by humans during navigation, this unified encoder might enhance steering angle estimation. To achieve effective multi-task learning within a single encoder, we introduce a multi-scale feature network for pose estimation to improve depth learning. Additionally, we employ knowledge distillation from a multi-backbone model pretrained on these navigation tasks to stabilize training and boost performance. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities. While our performance in steering angle estimation is comparable to existing methods, the integration of human-like perception through multi-task learning holds significant potential for advancing autonomous driving systems. More details and the pretrained model are available at https://hi-computervision.github.io/uni-encoder/.</description><subject>Autonomous navigation</subject><subject>Coders</subject><subject>Computer vision</subject><subject>Deep learning</subject><subject>Human error</subject><subject>Human performance</subject><subject>Perception</subject><subject>Pose estimation</subject><subject>Steering</subject><subject>Task complexity</subject><subject>Three dimensional flow</subject><subject>Three dimensional motion</subject><subject>Traffic congestion</subject><subject>Traffic control</subject><subject>Traffic safety</subject><subject>Visual tasks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjc0KwjAQhIMgKOo7LHgu1MSf6k20oqAgqGcJcaPrT1qzqRdf3lZ8AE8DM9_M1ERTKtWLkr6UDdFhvsZxLIcjORiopngvi4d2sHJM50tgmHt6oYO1DugC7HJtEGzmYU7Woq-8iiB3hi16ztGEkucJTOHgyBKeIHUmO6H_tlJryVDV2hT3QNFe8638-i4ZbIu61XfGzk9bortI97NllPvsWSCH4zUrvCujo-rFQ6nGiUzUf9QH7_NODQ</recordid><startdate>20240916</startdate><enddate>20240916</enddate><creator>Nguyen, Huy-Dung</creator><creator>Bairouk, Anass</creator><creator>Maras, Mirjana</creator><creator>Xiao, Wei</creator><creator>Wang, Tsun-Hsuan</creator><creator>Chareyre, Patrick</creator><creator>Hasani, Ramin</creator><creator>Blanchon, Marc</creator><creator>Rus, Daniela</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240916</creationdate><title>Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference</title><author>Nguyen, Huy-Dung ; Bairouk, Anass ; Maras, Mirjana ; Xiao, Wei ; Wang, Tsun-Hsuan ; Chareyre, Patrick ; Hasani, Ramin ; Blanchon, Marc ; Rus, Daniela</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31062398283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Autonomous navigation</topic><topic>Coders</topic><topic>Computer vision</topic><topic>Deep learning</topic><topic>Human error</topic><topic>Human performance</topic><topic>Perception</topic><topic>Pose estimation</topic><topic>Steering</topic><topic>Task complexity</topic><topic>Three dimensional flow</topic><topic>Three dimensional motion</topic><topic>Traffic congestion</topic><topic>Traffic control</topic><topic>Traffic safety</topic><topic>Visual tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Huy-Dung</creatorcontrib><creatorcontrib>Bairouk, Anass</creatorcontrib><creatorcontrib>Maras, Mirjana</creatorcontrib><creatorcontrib>Xiao, Wei</creatorcontrib><creatorcontrib>Wang, Tsun-Hsuan</creatorcontrib><creatorcontrib>Chareyre, Patrick</creatorcontrib><creatorcontrib>Hasani, Ramin</creatorcontrib><creatorcontrib>Blanchon, Marc</creatorcontrib><creatorcontrib>Rus, Daniela</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Huy-Dung</au><au>Bairouk, Anass</au><au>Maras, Mirjana</au><au>Xiao, Wei</au><au>Wang, Tsun-Hsuan</au><au>Chareyre, Patrick</au><au>Hasani, Ramin</au><au>Blanchon, Marc</au><au>Rus, Daniela</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference</atitle><jtitle>arXiv.org</jtitle><date>2024-09-16</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Autonomous driving holds great potential to transform road safety and traffic efficiency by minimizing human error and reducing congestion. A key challenge in realizing this potential is the accurate estimation of steering angles, which is essential for effective vehicle navigation and control. Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. In this paper, we propose a shared encoder trained on multiple computer vision tasks critical for urban navigation, such as depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By incorporating diverse visual information used by humans during navigation, this unified encoder might enhance steering angle estimation. To achieve effective multi-task learning within a single encoder, we introduce a multi-scale feature network for pose estimation to improve depth learning. Additionally, we employ knowledge distillation from a multi-backbone model pretrained on these navigation tasks to stabilize training and boost performance. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities. While our performance in steering angle estimation is comparable to existing methods, the integration of human-like perception through multi-task learning holds significant potential for advancing autonomous driving systems. More details and the pretrained model are available at https://hi-computervision.github.io/uni-encoder/.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3106239828
source	Free E- Journals
subjects	Autonomous navigation Coders Computer vision Deep learning Human error Human performance Perception Pose estimation Steering Task complexity Three dimensional flow Three dimensional motion Traffic congestion Traffic control Traffic safety Visual tasks
title	Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T18%3A27%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Human%20Insights%20Driven%20Latent%20Space%20for%20Different%20Driving%20Perspectives:%20A%20Unified%20Encoder%20for%20Efficient%20Multi-Task%20Inference&rft.jtitle=arXiv.org&rft.au=Nguyen,%20Huy-Dung&rft.date=2024-09-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3106239828%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106239828&rft_id=info:pmid/&rfr_iscdi=true