Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model

Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of variou...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xiong, Siheng, Payani, Ali, Yang, Yuan, Fekri, Faramarz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xiong, Siheng
Payani, Ali
Yang, Yuan
Fekri, Faramarz
description Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing methods.
doi_str_mv 10.48550/arxiv.2410.03136
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03136</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03136</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_031363</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGptxMgS7pOZkJqUWJZakKgSlJhbn52XmpSuk5Rcp-Pj4FiskFisElxSVJpeUFqXqJpYnFqUqBOQk5oEVlWeWZCg4JieXgjWH5xflpCj45qek5vAwsKYl5hSn8kJpbgZ5N9cQZw9dsPXxBUWZuYlFlfEgZ8SDnWFMWAUAjx09Lw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><source>arXiv.org</source><creator>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</creator><creatorcontrib>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</creatorcontrib><description>Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing methods.</description><identifier>DOI: 10.48550/arxiv.2410.03136</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03136$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03136$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiong, Siheng</creatorcontrib><creatorcontrib>Payani, Ali</creatorcontrib><creatorcontrib>Yang, Yuan</creatorcontrib><creatorcontrib>Fekri, Faramarz</creatorcontrib><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><description>Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing methods.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGptxMgS7pOZkJqUWJZakKgSlJhbn52XmpSuk5Rcp-Pj4FiskFisElxSVJpeUFqXqJpYnFqUqBOQk5oEVlWeWZCg4JieXgjWH5xflpCj45qek5vAwsKYl5hSn8kJpbgZ5N9cQZw9dsPXxBUWZuYlFlfEgZ8SDnWFMWAUAjx09Lw</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Xiong, Siheng</creator><creator>Payani, Ali</creator><creator>Yang, Yuan</creator><creator>Fekri, Faramarz</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</title><author>Xiong, Siheng ; Payani, Ali ; Yang, Yuan ; Fekri, Faramarz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_031363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiong, Siheng</creatorcontrib><creatorcontrib>Payani, Ali</creatorcontrib><creatorcontrib>Yang, Yuan</creatorcontrib><creatorcontrib>Fekri, Faramarz</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiong, Siheng</au><au>Payani, Ali</au><au>Yang, Yuan</au><au>Fekri, Faramarz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model</atitle><date>2024-10-04</date><risdate>2024</risdate><abstract>Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing methods.</abstract><doi>10.48550/arxiv.2410.03136</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2410.03136
ispartof
issn
language eng
recordid cdi_arxiv_primary_2410_03136
source arXiv.org
subjects Computer Science - Computation and Language
title Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T16%3A03%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deliberate%20Reasoning%20for%20LLMs%20as%20Structure-aware%20Planning%20with%20Accurate%20World%20Model&rft.au=Xiong,%20Siheng&rft.date=2024-10-04&rft_id=info:doi/10.48550/arxiv.2410.03136&rft_dat=%3Carxiv_GOX%3E2410_03136%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true