Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Reppert, Justin, Rachbach, Ben, George, Charlie, Stebbing, Luke, Byun, Jungwon, Appleton, Maggie, Stuhlmüller, Andreas
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Human-Computer Interaction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Reppert, Justin Rachbach, Ben George, Charlie Stebbing, Luke Byun, Jungwon Appleton, Maggie Stuhlmüller, Andreas
description	Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.
doi_str_mv	10.48550/arxiv.2301.01751
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2301_01751</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2301_01751</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</originalsourceid><addsrcrecordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><source>arXiv.org</source><creator>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creator><creatorcontrib>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</creatorcontrib><description>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</description><identifier>DOI: 10.48550/arxiv.2301.01751</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Human-Computer Interaction</subject><creationdate>2023-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2301.01751$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2301.01751$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><description>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Human-Computer Interaction</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tLxDAUBeBsXMjoD3BlVu5a82iSxt0wvgoDjs7sS27mVgK2KUktzr-Xjq7OgQMHPkJuOCurWil279JPmEshGS8ZN4pfkl0zYXITHukj-tiPMYcpxOGBNv2Y4hyGT7r3AQeP9P1uTeFE998jpjnkZfpAl-OwtF2KHnPGfEUuOveV8fo_V-Tw_HTYvBbbt5dms94WThteaG5AgLOdtpXrJFoAIYAbblBwOFaKWa50bb2rmem0B-EBlEBTW5A1erkit3-3Z1I7ptC7dGoXWnumyV9PHknD</recordid><startdate>20230104</startdate><enddate>20230104</enddate><creator>Reppert, Justin</creator><creator>Rachbach, Ben</creator><creator>George, Charlie</creator><creator>Stebbing, Luke</creator><creator>Byun, Jungwon</creator><creator>Appleton, Maggie</creator><creator>Stuhlmüller, Andreas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230104</creationdate><title>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</title><author>Reppert, Justin ; Rachbach, Ben ; George, Charlie ; Stebbing, Luke ; Byun, Jungwon ; Appleton, Maggie ; Stuhlmüller, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-617b2ba9f694af3e9bb22b1717e21bd450915689ca807f6cb2cbb52e789b38ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Human-Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Reppert, Justin</creatorcontrib><creatorcontrib>Rachbach, Ben</creatorcontrib><creatorcontrib>George, Charlie</creatorcontrib><creatorcontrib>Stebbing, Luke</creatorcontrib><creatorcontrib>Byun, Jungwon</creatorcontrib><creatorcontrib>Appleton, Maggie</creatorcontrib><creatorcontrib>Stuhlmüller, Andreas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Reppert, Justin</au><au>Rachbach, Ben</au><au>George, Charlie</au><au>Stebbing, Luke</au><au>Byun, Jungwon</au><au>Appleton, Maggie</au><au>Stuhlmüller, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes</atitle><date>2023-01-04</date><risdate>2023</risdate><abstract>Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for interpretability and safety, but may need workflow support and infrastructure to remain competitive. We describe iterated decomposition, a human-in-the-loop workflow for developing and refining compositional LM programs. We improve the performance of compositions by zooming in on failing components and refining them through decomposition, additional context, chain of thought, etc. To support this workflow, we develop ICE, an open-source tool for visualizing the execution traces of LM programs. We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%). These applications serve as case studies for a workflow that, if automated, could keep ML systems interpretable and safe even as they scale to increasingly complex tasks.</abstract><doi>10.48550/arxiv.2301.01751</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2301.01751
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2301_01751
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Human-Computer Interaction
title	Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T00%3A34%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterated%20Decomposition:%20Improving%20Science%20Q&A%20by%20Supervising%20Reasoning%20Processes&rft.au=Reppert,%20Justin&rft.date=2023-01-04&rft_id=info:doi/10.48550/arxiv.2301.01751&rft_dat=%3Carxiv_GOX%3E2301_01751%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true