Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks
While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and addi...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-11 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sun, Xingpeng Meng, Haoming Chakraborty, Souradip Amrit Singh Bedi Bera, Aniket |
description | While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2923192557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2923192557</sourcerecordid><originalsourceid>FETCH-proquest_journals_29231925573</originalsourceid><addsrcrecordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923192557</pqid></control><display><type>article</type><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><source>Free E- Journals</source><creator>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creator><creatorcontrib>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</creatorcontrib><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision making ; Human engineering ; Large language models ; Navigation ; Robots ; Verbal communication</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><title>arXiv.org</title><description>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</description><subject>Decision making</subject><subject>Human engineering</subject><subject>Large language models</subject><subject>Navigation</subject><subject>Robots</subject><subject>Verbal communication</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjsEKgkAURYcgKKp_eNBa0JmsbJkVBdYirK1MNsqkzStnlOrrG6EPaHUX53C4HdKnjHnOfEJpj4y0vrmuS6cz6vusT5KleKO6QixeZgEnI0v5kSqHM6a8hLAWGgzC7v6osBGwEqnUEhXsedFaUkEU7TVkWMERL2jgwBuZc9M6MdeFHpJuxkstRr8dkPFmHYdbxwaftm6SG9aVsiihAWVeYG_N2H_WF_rxQ14</recordid><startdate>20241111</startdate><enddate>20241111</enddate><creator>Sun, Xingpeng</creator><creator>Meng, Haoming</creator><creator>Chakraborty, Souradip</creator><creator>Amrit Singh Bedi</creator><creator>Bera, Aniket</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241111</creationdate><title>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</title><author>Sun, Xingpeng ; Meng, Haoming ; Chakraborty, Souradip ; Amrit Singh Bedi ; Bera, Aniket</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29231925573</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Decision making</topic><topic>Human engineering</topic><topic>Large language models</topic><topic>Navigation</topic><topic>Robots</topic><topic>Verbal communication</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xingpeng</creatorcontrib><creatorcontrib>Meng, Haoming</creatorcontrib><creatorcontrib>Chakraborty, Souradip</creatorcontrib><creatorcontrib>Amrit Singh Bedi</creatorcontrib><creatorcontrib>Bera, Aniket</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xingpeng</au><au>Meng, Haoming</au><au>Chakraborty, Souradip</au><au>Amrit Singh Bedi</au><au>Bera, Aniket</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks</atitle><jtitle>arXiv.org</jtitle><date>2024-11-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2923192557 |
source | Free E- Journals |
subjects | Decision making Human engineering Large language models Navigation Robots Verbal communication |
title | Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A21%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Beyond%20Text:%20Utilizing%20Vocal%20Cues%20to%20Improve%20Decision%20Making%20in%20LLMs%20for%20Robot%20Navigation%20Tasks&rft.jtitle=arXiv.org&rft.au=Sun,%20Xingpeng&rft.date=2024-11-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2923192557%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923192557&rft_id=info:pmid/&rfr_iscdi=true |