Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Reppert, Justin, Rachbach, Ben, George, Charlie, Stebbing, Luke, Byun, Jungwon, Appleton, Maggie, Stuhlmüller, Andreas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Reppert, Justin
Rachbach, Ben
George, Charlie
Stebbing, Luke
Byun, Jungwon
Appleton, Maggie
Stuhlmüller, Andreas
description Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.
doi_str_mv 10.48550/arxiv.2301.01751
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2301_01751</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2301_01751</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</originalsourceid><addsrcrecordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Iterated Decomposition: Improving Science Q&amp;A by Supervising Reasoning Processes</title><source>arXiv.org</source><creator>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creator><creatorcontrib>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creatorcontrib><description>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</description><identifier>DOI: 10.48550/arxiv.2301.01751</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Human-Computer Interaction</subject><creationdate>2023-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2301.01751$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2301.01751$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><title>Iterated Decomposition: Improving Science Q&amp;A by Supervising Reasoning Processes</title><description>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Human-Computer Interaction</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</recordid><startdate>20230104</startdate><enddate>20230104</enddate><creator>Reppert, Justin</creator><creator>Rachbach, Ben</creator><creator>George, Charlie</creator><creator>Stebbing, Luke</creator><creator>Byun, Jungwon</creator><creator>Appleton, Maggie</creator><creator>Stuhlmüller, Andreas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230104</creationdate><title>Iterated Decomposition: Improving Science Q&amp;A by Supervising Reasoning Processes</title><author>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Human-Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Reppert, Justin</au><au>Rachbach, Ben</au><au>George, Charlie</au><au>Stebbing, Luke</au><au>Byun, Jungwon</au><au>Appleton, Maggie</au><au>Stuhlmüller, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterated Decomposition: Improving Science Q&amp;A by Supervising Reasoning Processes</atitle><date>2023-01-04</date><risdate>2023</risdate><abstract>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</abstract><doi>10.48550/arxiv.2301.01751</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2301.01751
ispartof
issn
language eng
recordid cdi_arxiv_primary_2301_01751
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Human-Computer Interaction
title Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A34%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterated%20Decomposition:%20Improving%20Science%20Q&A%20by%20Supervising%20Reasoning%20Processes&rft.au=Reppert,%20Justin&rft.date=2023-01-04&rft_id=info:doi/10.48550/arxiv.2301.01751&rft_dat=%3Carxiv_GOX%3E2301_01751%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true