Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model
Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of variou...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Xiong, Siheng Payani, Ali Yang, Yuan Fekri, Faramarz |
description | Enhancing the reasoning capabilities of large language models (LLMs) remains
a key challenge, especially for tasks that require complex, multi-step
decision-making. Humans excel at these tasks by leveraging deliberate planning
with an internal world model to simulate the potential outcomes of various
actions. Inspired by this, we propose a novel multi-step reasoning framework
for LLMs, referred to as Structure-aware Planning with Accurate World Model
(SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT)
reasoning in natural language, SWAP incorporates structural information to
guide the reasoning process via a world model and provides a soft verification
mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate
world state predictions in complex reasoning tasks by introducing a
Generator-Discriminator architecture, which enables more reliable world
modeling. Specifically, the generator predicts the next state, and the
discriminator ensures alignment with the logical consistency required by the
problem context. SWAP also encourages the policy model to explore a broad range
of potential actions to prevent premature convergence. By resolving the
bottlenecks of generation diversity for both actions and states using
diversity-based modeling (DBM) and improving discrimination accuracy through
contrastive ranking (CR), SWAP significantly enhances the reasoning performance
of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks
including math reasoning, logical reasoning, and coding tasks. Extensive
experiments demonstrate that SWAP achieves substantial improvements over the
baselines and consistently outperforms existing methods. |
doi_str_mv | 10.48550/arxiv.2410.03136 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03136</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03136</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_031363</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGptxMgS7pOZkJqUWJZakKgSlJhbn52XmpSuk5Rcp-Pj4FiskFisElxSVJpeUFqXqJpYnFqUqBOQk5oEVlWeWZCg4JieXgjWH5xflpCj45qek5vAwsKYl5hSn8kJpbgZ5N9cQZw9dsPXxBUWZuYlFlfEgZ8SDnWFMWAUAjx09Lw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><source>arXiv.org</source><creator>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</creator><creatorcontrib>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</creatorcontrib><description>Enhancing the reasoning capabilities of large language models (LLMs) remains
a key challenge, especially for tasks that require complex, multi-step
decision-making. Humans excel at these tasks by leveraging deliberate planning
with an internal world model to simulate the potential outcomes of various
actions. Inspired by this, we propose a novel multi-step reasoning framework
for LLMs, referred to as Structure-aware Planning with Accurate World Model
(SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT)
reasoning in natural language, SWAP incorporates structural information to
guide the reasoning process via a world model and provides a soft verification
mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate
world state predictions in complex reasoning tasks by introducing a
Generator-Discriminator architecture, which enables more reliable world
modeling. Specifically, the generator predicts the next state, and the
discriminator ensures alignment with the logical consistency required by the
problem context. SWAP also encourages the policy model to explore a broad range
of potential actions to prevent premature convergence. By resolving the
bottlenecks of generation diversity for both actions and states using
diversity-based modeling (DBM) and improving discrimination accuracy through
contrastive ranking (CR), SWAP significantly enhances the reasoning performance
of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks
including math reasoning, logical reasoning, and coding tasks. Extensive
experiments demonstrate that SWAP achieves substantial improvements over the
baselines and consistently outperforms existing methods.</description><identifier>DOI: 10.48550/arxiv.2410.03136</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03136$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03136$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiong, Siheng</creatorcontrib><creatorcontrib>Payani, Ali</creatorcontrib><creatorcontrib>Yang, Yuan</creatorcontrib><creatorcontrib>Fekri, Faramarz</creatorcontrib><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><description>Enhancing the reasoning capabilities of large language models (LLMs) remains
a key challenge, especially for tasks that require complex, multi-step
decision-making. Humans excel at these tasks by leveraging deliberate planning
with an internal world model to simulate the potential outcomes of various
actions. Inspired by this, we propose a novel multi-step reasoning framework
for LLMs, referred to as Structure-aware Planning with Accurate World Model
(SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT)
reasoning in natural language, SWAP incorporates structural information to
guide the reasoning process via a world model and provides a soft verification
mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate
world state predictions in complex reasoning tasks by introducing a
Generator-Discriminator architecture, which enables more reliable world
modeling. Specifically, the generator predicts the next state, and the
discriminator ensures alignment with the logical consistency required by the
problem context. SWAP also encourages the policy model to explore a broad range
of potential actions to prevent premature convergence. By resolving the
bottlenecks of generation diversity for both actions and states using
diversity-based modeling (DBM) and improving discrimination accuracy through
contrastive ranking (CR), SWAP significantly enhances the reasoning performance
of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks
including math reasoning, logical reasoning, and coding tasks. Extensive
experiments demonstrate that SWAP achieves substantial improvements over the
baselines and consistently outperforms existing methods.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGptxMgS7pOZkJqUWJZakKgSlJhbn52XmpSuk5Rcp-Pj4FiskFisElxSVJpeUFqXqJpYnFqUqBOQk5oEVlWeWZCg4JieXgjWH5xflpCj45qek5vAwsKYl5hSn8kJpbgZ5N9cQZw9dsPXxBUWZuYlFlfEgZ8SDnWFMWAUAjx09Lw</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Xiong, Siheng</creator><creator>Payani, Ali</creator><creator>Yang, Yuan</creator><creator>Fekri, Faramarz</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><author>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_031363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiong, Siheng</creatorcontrib><creatorcontrib>Payani, Ali</creatorcontrib><creatorcontrib>Yang, Yuan</creatorcontrib><creatorcontrib>Fekri, Faramarz</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiong, Siheng</au><au>Payani, Ali</au><au>Yang, Yuan</au><au>Fekri, Faramarz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</atitle><date>2024-10-04</date><risdate>2024</risdate><abstract>Enhancing the reasoning capabilities of large language models (LLMs) remains
a key challenge, especially for tasks that require complex, multi-step
decision-making. Humans excel at these tasks by leveraging deliberate planning
with an internal world model to simulate the potential outcomes of various
actions. Inspired by this, we propose a novel multi-step reasoning framework
for LLMs, referred to as Structure-aware Planning with Accurate World Model
(SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT)
reasoning in natural language, SWAP incorporates structural information to
guide the reasoning process via a world model and provides a soft verification
mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate
world state predictions in complex reasoning tasks by introducing a
Generator-Discriminator architecture, which enables more reliable world
modeling. Specifically, the generator predicts the next state, and the
discriminator ensures alignment with the logical consistency required by the
problem context. SWAP also encourages the policy model to explore a broad range
of potential actions to prevent premature convergence. By resolving the
bottlenecks of generation diversity for both actions and states using
diversity-based modeling (DBM) and improving discrimination accuracy through
contrastive ranking (CR), SWAP significantly enhances the reasoning performance
of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks
including math reasoning, logical reasoning, and coding tasks. Extensive
experiments demonstrate that SWAP achieves substantial improvements over the
baselines and consistently outperforms existing methods.</abstract><doi>10.48550/arxiv.2410.03136</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.03136 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_03136 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T16%3A03%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deliberate%20Reasoning%20for%20LLMs%20as%20Structure-aware%20Planning%20with%20Accurate%20World%20Model&rft.au=Xiong,%20Siheng&rft.date=2024-10-04&rft_id=info:doi/10.48550/arxiv.2410.03136&rft_dat=%3Carxiv_GOX%3E2410_03136%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |