Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe i...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Reppert, Justin Rachbach, Ben George, Charlie Stebbing, Luke Byun, Jungwon Appleton, Maggie Stuhlmüller, Andreas |
description | Language models (LMs) can perform complex reasoning either end-to-end, with
hidden latent state, or compositionally, with transparent intermediate state.
Composition offers benefits for interpretability and safety, but may need
workflow support and infrastructure to remain competitive. We describe iterated
decomposition, a human-in-the-loop workflow for developing and refining
compositional LM programs. We improve the performance of compositions by
zooming in on failing components and refining them through decomposition,
additional context, chain of thought, etc. To support this workflow, we develop
ICE, an open-source tool for visualizing the execution traces of LM programs.
We apply iterated decomposition to three real-world tasks and improve the
accuracy of LM programs over less compositional baselines: describing the
placebo used in a randomized controlled trial (25% to 65%), evaluating
participant adherence to a medical intervention (53% to 70%), and answering NLP
questions on the Qasper dataset (38% to 69%). These applications serve as case
studies for a workflow that, if automated, could keep ML systems interpretable
and safe even as they scale to increasingly complex tasks. |
doi_str_mv | 10.48550/arxiv.2301.01751 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2301_01751</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2301_01751</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</originalsourceid><addsrcrecordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><source>arXiv.org</source><creator>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creator><creatorcontrib>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creatorcontrib><description>Language models (LMs) can perform complex reasoning either end-to-end, with
hidden latent state, or compositionally, with transparent intermediate state.
Composition offers benefits for interpretability and safety, but may need
workflow support and infrastructure to remain competitive. We describe iterated
decomposition, a human-in-the-loop workflow for developing and refining
compositional LM programs. We improve the performance of compositions by
zooming in on failing components and refining them through decomposition,
additional context, chain of thought, etc. To support this workflow, we develop
ICE, an open-source tool for visualizing the execution traces of LM programs.
We apply iterated decomposition to three real-world tasks and improve the
accuracy of LM programs over less compositional baselines: describing the
placebo used in a randomized controlled trial (25% to 65%), evaluating
participant adherence to a medical intervention (53% to 70%), and answering NLP
questions on the Qasper dataset (38% to 69%). These applications serve as case
studies for a workflow that, if automated, could keep ML systems interpretable
and safe even as they scale to increasingly complex tasks.</description><identifier>DOI: 10.48550/arxiv.2301.01751</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Human-Computer Interaction</subject><creationdate>2023-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2301.01751$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2301.01751$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><description>Language models (LMs) can perform complex reasoning either end-to-end, with
hidden latent state, or compositionally, with transparent intermediate state.
Composition offers benefits for interpretability and safety, but may need
workflow support and infrastructure to remain competitive. We describe iterated
decomposition, a human-in-the-loop workflow for developing and refining
compositional LM programs. We improve the performance of compositions by
zooming in on failing components and refining them through decomposition,
additional context, chain of thought, etc. To support this workflow, we develop
ICE, an open-source tool for visualizing the execution traces of LM programs.
We apply iterated decomposition to three real-world tasks and improve the
accuracy of LM programs over less compositional baselines: describing the
placebo used in a randomized controlled trial (25% to 65%), evaluating
participant adherence to a medical intervention (53% to 70%), and answering NLP
questions on the Qasper dataset (38% to 69%). These applications serve as case
studies for a workflow that, if automated, could keep ML systems interpretable
and safe even as they scale to increasingly complex tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Human-Computer Interaction</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</recordid><startdate>20230104</startdate><enddate>20230104</enddate><creator>Reppert, Justin</creator><creator>Rachbach, Ben</creator><creator>George, Charlie</creator><creator>Stebbing, Luke</creator><creator>Byun, Jungwon</creator><creator>Appleton, Maggie</creator><creator>Stuhlmüller, Andreas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230104</creationdate><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><author>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Human-Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Reppert, Justin</au><au>Rachbach, Ben</au><au>George, Charlie</au><au>Stebbing, Luke</au><au>Byun, Jungwon</au><au>Appleton, Maggie</au><au>Stuhlmüller, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</atitle><date>2023-01-04</date><risdate>2023</risdate><abstract>Language models (LMs) can perform complex reasoning either end-to-end, with
hidden latent state, or compositionally, with transparent intermediate state.
Composition offers benefits for interpretability and safety, but may need
workflow support and infrastructure to remain competitive. We describe iterated
decomposition, a human-in-the-loop workflow for developing and refining
compositional LM programs. We improve the performance of compositions by
zooming in on failing components and refining them through decomposition,
additional context, chain of thought, etc. To support this workflow, we develop
ICE, an open-source tool for visualizing the execution traces of LM programs.
We apply iterated decomposition to three real-world tasks and improve the
accuracy of LM programs over less compositional baselines: describing the
placebo used in a randomized controlled trial (25% to 65%), evaluating
participant adherence to a medical intervention (53% to 70%), and answering NLP
questions on the Qasper dataset (38% to 69%). These applications serve as case
studies for a workflow that, if automated, could keep ML systems interpretable
and safe even as they scale to increasingly complex tasks.</abstract><doi>10.48550/arxiv.2301.01751</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2301.01751 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2301_01751 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Human-Computer Interaction |
title | Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A34%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterated%20Decomposition:%20Improving%20Science%20Q&A%20by%20Supervising%20Reasoning%20Processes&rft.au=Reppert,%20Justin&rft.date=2023-01-04&rft_id=info:doi/10.48550/arxiv.2301.01751&rft_dat=%3Carxiv_GOX%3E2301_01751%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |