Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and addi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-11
Hauptverfasser:	Sun, Xingpeng, Meng, Haoming, Chakraborty, Souradip, Amrit Singh Bedi, Bera, Aniket
Format:	Artikel
Sprache:	eng
Schlagworte:	Decision making Human engineering Large language models Navigation Robots Verbal communication
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sun, Xingpeng Meng, Haoming Chakraborty, Souradip Amrit Singh Bedi Bera, Aniket
description	While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2923192557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2923192557</sourcerecordid><originalsourceid>FETCH-proquest_journals_29231925573</originalsourceid><addsrcrecordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923192557</pqid></control><display><type>article</type><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><source>Free E- Journals</source><creator>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creator><creatorcontrib>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creatorcontrib><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision making ; Human engineering ; Large language models ; Navigation ; Robots ; Verbal communication</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><title>arXiv.org</title><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><subject>Decision making</subject><subject>Human engineering</subject><subject>Large language models</subject><subject>Navigation</subject><subject>Robots</subject><subject>Verbal communication</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</recordid><startdate>20241111</startdate><enddate>20241111</enddate><creator>Sun, Xingpeng</creator><creator>Meng, Haoming</creator><creator>Chakraborty, Souradip</creator><creator>Amrit Singh Bedi</creator><creator>Bera, Aniket</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241111</creationdate><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><author>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29231925573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decision making</topic><topic>Human engineering</topic><topic>Large language models</topic><topic>Navigation</topic><topic>Robots</topic><topic>Verbal communication</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xingpeng</au><au>Meng, Haoming</au><au>Chakraborty, Souradip</au><au>Amrit Singh Bedi</au><au>Bera, Aniket</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2923192557
source	Free E- Journals
subjects	Decision making Human engineering Large language models Navigation Robots Verbal communication
title	Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A21%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Beyond%20Text:%20Utilizing%20Vocal%20Cues%20to%20Improve%20Decision%20Making%20in%20LLMs%20for%20Robot%20Navigation%20Tasks&rft.jtitle=arXiv.org&rft.au=Sun,%20Xingpeng&rft.date=2024-11-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2923192557%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923192557&rft_id=info:pmid/&rfr_iscdi=true