Controlling Large Language Model Agents with Entropic Activation Steering

The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Rahn, Nate, D'Oro, Pierluca, Bellemare, Marc G
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Rahn, Nate
D'Oro, Pierluca
Bellemare, Marc G
description The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.
doi_str_mv 10.48550/arxiv.2406.00244
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_00244</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_00244</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_002443</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMDIx4WTwdM7PKynKz8nJzEtX8EksSk8FknnppYlAhm9-SmqOgmN6al5JsUJ5ZkmGgitIbUFmsoJjcklmWWJJZn6eQnBJamoRUDcPA2taYk5xKi-U5maQd3MNcfbQBVsaX1CUmZtYVBkPsjwebLkxYRUAKgg5_w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Controlling Large Language Model Agents with Entropic Activation Steering</title><source>arXiv.org</source><creator>Rahn, Nate ; D'Oro, Pierluca ; Bellemare, Marc G</creator><creatorcontrib>Rahn, Nate ; D'Oro, Pierluca ; Bellemare, Marc G</creatorcontrib><description>The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.</description><identifier>DOI: 10.48550/arxiv.2406.00244</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.00244$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.00244$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Rahn, Nate</creatorcontrib><creatorcontrib>D'Oro, Pierluca</creatorcontrib><creatorcontrib>Bellemare, Marc G</creatorcontrib><title>Controlling Large Language Model Agents with Entropic Activation Steering</title><description>The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMDIx4WTwdM7PKynKz8nJzEtX8EksSk8FknnppYlAhm9-SmqOgmN6al5JsUJ5ZkmGgitIbUFmsoJjcklmWWJJZn6eQnBJamoRUDcPA2taYk5xKi-U5maQd3MNcfbQBVsaX1CUmZtYVBkPsjwebLkxYRUAKgg5_w</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>Rahn, Nate</creator><creator>D'Oro, Pierluca</creator><creator>Bellemare, Marc G</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240531</creationdate><title>Controlling Large Language Model Agents with Entropic Activation Steering</title><author>Rahn, Nate ; D'Oro, Pierluca ; Bellemare, Marc G</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_002443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Rahn, Nate</creatorcontrib><creatorcontrib>D'Oro, Pierluca</creatorcontrib><creatorcontrib>Bellemare, Marc G</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rahn, Nate</au><au>D'Oro, Pierluca</au><au>Bellemare, Marc G</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Controlling Large Language Model Agents with Entropic Activation Steering</atitle><date>2024-05-31</date><risdate>2024</risdate><abstract>The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.</abstract><doi>10.48550/arxiv.2406.00244</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2406.00244
ispartof
issn
language eng
recordid cdi_arxiv_primary_2406_00244
source arXiv.org
subjects Computer Science - Computation and Language
title Controlling Large Language Model Agents with Entropic Activation Steering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T07%3A14%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Controlling%20Large%20Language%20Model%20Agents%20with%20Entropic%20Activation%20Steering&rft.au=Rahn,%20Nate&rft.date=2024-05-31&rft_id=info:doi/10.48550/arxiv.2406.00244&rft_dat=%3Carxiv_GOX%3E2406_00244%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true