Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and addi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Sun, Xingpeng, Meng, Haoming, Chakraborty, Souradip, Amrit Singh Bedi, Bera, Aniket
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sun, Xingpeng
Meng, Haoming
Chakraborty, Souradip
Amrit Singh Bedi
Bera, Aniket
description While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2923192557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2923192557</sourcerecordid><originalsourceid>FETCH-proquest_journals_29231925573</originalsourceid><addsrcrecordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923192557</pqid></control><display><type>article</type><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><source>Free E- Journals</source><creator>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creator><creatorcontrib>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creatorcontrib><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision making ; Human engineering ; Large language models ; Navigation ; Robots ; Verbal communication</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><title>arXiv.org</title><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><subject>Decision making</subject><subject>Human engineering</subject><subject>Large language models</subject><subject>Navigation</subject><subject>Robots</subject><subject>Verbal communication</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</recordid><startdate>20241111</startdate><enddate>20241111</enddate><creator>Sun, Xingpeng</creator><creator>Meng, Haoming</creator><creator>Chakraborty, Souradip</creator><creator>Amrit Singh Bedi</creator><creator>Bera, Aniket</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241111</creationdate><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><author>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29231925573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decision making</topic><topic>Human engineering</topic><topic>Large language models</topic><topic>Navigation</topic><topic>Robots</topic><topic>Verbal communication</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xingpeng</au><au>Meng, Haoming</au><au>Chakraborty, Souradip</au><au>Amrit Singh Bedi</au><au>Bera, Aniket</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2923192557
source Free E- Journals
subjects Decision making
Human engineering
Large language models
Navigation
Robots
Verbal communication
title Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A21%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Beyond%20Text:%20Utilizing%20Vocal%20Cues%20to%20Improve%20Decision%20Making%20in%20LLMs%20for%20Robot%20Navigation%20Tasks&rft.jtitle=arXiv.org&rft.au=Sun,%20Xingpeng&rft.date=2024-11-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2923192557%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923192557&rft_id=info:pmid/&rfr_iscdi=true