Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in d...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Chen, Long Sinavski, Oleg Hünermann, Jan Karnsund, Alice Willmott, Andrew James Birch, Danny Maund, Daniel Shotton, Jamie |
description | Large Language Models (LLMs) have shown promise in the autonomous driving
sector, particularly in generalization and interpretability. We introduce a
unique object-level multimodal LLM architecture that merges vectorized numeric
modalities with a pre-trained LLM to improve context understanding in driving
situations. We also present a new dataset of 160k QA pairs derived from 10k
driving scenarios, paired with high quality control commands collected with RL
agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct
pretraining strategy is devised to align numeric vector modalities with static
LLM representations using vector captioning language data. We also introduce an
evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency
in interpreting driving scenarios, answering questions, and decision-making.
Our findings highlight the potential of LLM-based driving action generation in
comparison to traditional behavioral cloning. We make our benchmark, datasets,
and model available for further exploration. |
doi_str_mv | 10.48550/arxiv.2310.01957 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_01957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_01957</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-5e8ef98423675b7fefc0a35a22a4c23b24524becc23594d8a2728f7106fd8cd83</originalsourceid><addsrcrecordid>eNotj7tOwzAYhb0woMIDMOEXSHF8iR22qrSA5KpLhcQU_XZsMErjyrnQvj1p6XTO-YYjfQg95GTOlRDkCdIxjHPKJkDyUshb9PmSwhjaL_wb-m-s9aZ7xuuhO5Ot-XG2z7QbXYM_phoT3sQamtCfsJ_G6nhoILRgGocXQx_buI9Dh6-Pd-jGQ9O5-2vO0G692i3fMr19fV8udAaFlJlwyvlSccoKKYz0zlsCTAClwC1lhnJBuXF26qLktQIqqfIyJ4Wvla0Vm6HH_9uLW3VIYQ_pVJ0dq4sj-wMUUkye</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving</title><source>arXiv.org</source><creator>Chen, Long ; Sinavski, Oleg ; Hünermann, Jan ; Karnsund, Alice ; Willmott, Andrew James ; Birch, Danny ; Maund, Daniel ; Shotton, Jamie</creator><creatorcontrib>Chen, Long ; Sinavski, Oleg ; Hünermann, Jan ; Karnsund, Alice ; Willmott, Andrew James ; Birch, Danny ; Maund, Daniel ; Shotton, Jamie</creatorcontrib><description>Large Language Models (LLMs) have shown promise in the autonomous driving
sector, particularly in generalization and interpretability. We introduce a
unique object-level multimodal LLM architecture that merges vectorized numeric
modalities with a pre-trained LLM to improve context understanding in driving
situations. We also present a new dataset of 160k QA pairs derived from 10k
driving scenarios, paired with high quality control commands collected with RL
agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct
pretraining strategy is devised to align numeric vector modalities with static
LLM representations using vector captioning language data. We also introduce an
evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency
in interpreting driving scenarios, answering questions, and decision-making.
Our findings highlight the potential of LLM-based driving action generation in
comparison to traditional behavioral cloning. We make our benchmark, datasets,
and model available for further exploration.</description><identifier>DOI: 10.48550/arxiv.2310.01957</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Robotics</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.01957$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.01957$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Long</creatorcontrib><creatorcontrib>Sinavski, Oleg</creatorcontrib><creatorcontrib>Hünermann, Jan</creatorcontrib><creatorcontrib>Karnsund, Alice</creatorcontrib><creatorcontrib>Willmott, Andrew James</creatorcontrib><creatorcontrib>Birch, Danny</creatorcontrib><creatorcontrib>Maund, Daniel</creatorcontrib><creatorcontrib>Shotton, Jamie</creatorcontrib><title>Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving</title><description>Large Language Models (LLMs) have shown promise in the autonomous driving
sector, particularly in generalization and interpretability. We introduce a
unique object-level multimodal LLM architecture that merges vectorized numeric
modalities with a pre-trained LLM to improve context understanding in driving
situations. We also present a new dataset of 160k QA pairs derived from 10k
driving scenarios, paired with high quality control commands collected with RL
agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct
pretraining strategy is devised to align numeric vector modalities with static
LLM representations using vector captioning language data. We also introduce an
evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency
in interpreting driving scenarios, answering questions, and decision-making.
Our findings highlight the potential of LLM-based driving action generation in
comparison to traditional behavioral cloning. We make our benchmark, datasets,
and model available for further exploration.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAYhb0woMIDMOEXSHF8iR22qrSA5KpLhcQU_XZsMErjyrnQvj1p6XTO-YYjfQg95GTOlRDkCdIxjHPKJkDyUshb9PmSwhjaL_wb-m-s9aZ7xuuhO5Ot-XG2z7QbXYM_phoT3sQamtCfsJ_G6nhoILRgGocXQx_buI9Dh6-Pd-jGQ9O5-2vO0G692i3fMr19fV8udAaFlJlwyvlSccoKKYz0zlsCTAClwC1lhnJBuXF26qLktQIqqfIyJ4Wvla0Vm6HH_9uLW3VIYQ_pVJ0dq4sj-wMUUkye</recordid><startdate>20231003</startdate><enddate>20231003</enddate><creator>Chen, Long</creator><creator>Sinavski, Oleg</creator><creator>Hünermann, Jan</creator><creator>Karnsund, Alice</creator><creator>Willmott, Andrew James</creator><creator>Birch, Danny</creator><creator>Maund, Daniel</creator><creator>Shotton, Jamie</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231003</creationdate><title>Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving</title><author>Chen, Long ; Sinavski, Oleg ; Hünermann, Jan ; Karnsund, Alice ; Willmott, Andrew James ; Birch, Danny ; Maund, Daniel ; Shotton, Jamie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-5e8ef98423675b7fefc0a35a22a4c23b24524becc23594d8a2728f7106fd8cd83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Long</creatorcontrib><creatorcontrib>Sinavski, Oleg</creatorcontrib><creatorcontrib>Hünermann, Jan</creatorcontrib><creatorcontrib>Karnsund, Alice</creatorcontrib><creatorcontrib>Willmott, Andrew James</creatorcontrib><creatorcontrib>Birch, Danny</creatorcontrib><creatorcontrib>Maund, Daniel</creatorcontrib><creatorcontrib>Shotton, Jamie</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Long</au><au>Sinavski, Oleg</au><au>Hünermann, Jan</au><au>Karnsund, Alice</au><au>Willmott, Andrew James</au><au>Birch, Danny</au><au>Maund, Daniel</au><au>Shotton, Jamie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving</atitle><date>2023-10-03</date><risdate>2023</risdate><abstract>Large Language Models (LLMs) have shown promise in the autonomous driving
sector, particularly in generalization and interpretability. We introduce a
unique object-level multimodal LLM architecture that merges vectorized numeric
modalities with a pre-trained LLM to improve context understanding in driving
situations. We also present a new dataset of 160k QA pairs derived from 10k
driving scenarios, paired with high quality control commands collected with RL
agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct
pretraining strategy is devised to align numeric vector modalities with static
LLM representations using vector captioning language data. We also introduce an
evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency
in interpreting driving scenarios, answering questions, and decision-making.
Our findings highlight the potential of LLM-based driving action generation in
comparison to traditional behavioral cloning. We make our benchmark, datasets,
and model available for further exploration.</abstract><doi>10.48550/arxiv.2310.01957</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2310.01957 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2310_01957 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics |
title | Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T02%3A36%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Driving%20with%20LLMs:%20Fusing%20Object-Level%20Vector%20Modality%20for%20Explainable%20Autonomous%20Driving&rft.au=Chen,%20Long&rft.date=2023-10-03&rft_id=info:doi/10.48550/arxiv.2310.01957&rft_dat=%3Carxiv_GOX%3E2310_01957%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |