Archon: An Architecture Search Framework for Inference-Time Techniques

Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniq...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Saad-Falcon, Jon, Lafuente, Adrian Gamarra, Natarajan, Shlok, Maru, Nahum, Todorov, Hristo, Guha, Etash, Buchanan, E. Kelly, Chen, Mayee, Guha, Neel, Ré, Christopher, Mirhoseini, Azalia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Saad-Falcon, Jon Lafuente, Adrian Gamarra Natarajan, Shlok Maru, Nahum Todorov, Hristo Guha, Etash Buchanan, E. Kelly Chen, Mayee Guha, Neel Ré, Christopher Mirhoseini, Azalia
description	Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.
doi_str_mv	10.48550/arxiv.2409.15254
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_15254</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_15254</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_152543</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NTI14WRwcyxKzsjPs1JwzFMAMTNLUpNLSotSFYJTE4FcBbeixNzU8vyibIW0_CIFz7y01KLUvORU3ZDM3FSFkNTkjLzMwtLUYh4G1rTEnOJUXijNzSDv5hri7KELtjG-oCgzN7GoMh5kczzYZmPCKgAxuTiQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Archon: An Architecture Search Framework for Inference-Time Techniques</title><source>arXiv.org</source><creator>Saad-Falcon, Jon ; Lafuente, Adrian Gamarra ; Natarajan, Shlok ; Maru, Nahum ; Todorov, Hristo ; Guha, Etash ; Buchanan, E. Kelly ; Chen, Mayee ; Guha, Neel ; Ré, Christopher ; Mirhoseini, Azalia</creator><creatorcontrib>Saad-Falcon, Jon ; Lafuente, Adrian Gamarra ; Natarajan, Shlok ; Maru, Nahum ; Todorov, Hristo ; Guha, Etash ; Buchanan, E. Kelly ; Chen, Mayee ; Guha, Neel ; Ré, Christopher ; Mirhoseini, Azalia</creatorcontrib><description>Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.</description><identifier>DOI: 10.48550/arxiv.2409.15254</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.15254$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.15254$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Saad-Falcon, Jon</creatorcontrib><creatorcontrib>Lafuente, Adrian Gamarra</creatorcontrib><creatorcontrib>Natarajan, Shlok</creatorcontrib><creatorcontrib>Maru, Nahum</creatorcontrib><creatorcontrib>Todorov, Hristo</creatorcontrib><creatorcontrib>Guha, Etash</creatorcontrib><creatorcontrib>Buchanan, E. Kelly</creatorcontrib><creatorcontrib>Chen, Mayee</creatorcontrib><creatorcontrib>Guha, Neel</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Mirhoseini, Azalia</creatorcontrib><title>Archon: An Architecture Search Framework for Inference-Time Techniques</title><description>Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DM0NTI14WRwcyxKzsjPs1JwzFMAMTNLUpNLSotSFYJTE4FcBbeixNzU8vyibIW0_CIFz7y01KLUvORU3ZDM3FSFkNTkjLzMwtLUYh4G1rTEnOJUXijNzSDv5hri7KELtjG-oCgzN7GoMh5kczzYZmPCKgAxuTiQ</recordid><startdate>20240923</startdate><enddate>20240923</enddate><creator>Saad-Falcon, Jon</creator><creator>Lafuente, Adrian Gamarra</creator><creator>Natarajan, Shlok</creator><creator>Maru, Nahum</creator><creator>Todorov, Hristo</creator><creator>Guha, Etash</creator><creator>Buchanan, E. Kelly</creator><creator>Chen, Mayee</creator><creator>Guha, Neel</creator><creator>Ré, Christopher</creator><creator>Mirhoseini, Azalia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240923</creationdate><title>Archon: An Architecture Search Framework for Inference-Time Techniques</title><author>Saad-Falcon, Jon ; Lafuente, Adrian Gamarra ; Natarajan, Shlok ; Maru, Nahum ; Todorov, Hristo ; Guha, Etash ; Buchanan, E. Kelly ; Chen, Mayee ; Guha, Neel ; Ré, Christopher ; Mirhoseini, Azalia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_152543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Saad-Falcon, Jon</creatorcontrib><creatorcontrib>Lafuente, Adrian Gamarra</creatorcontrib><creatorcontrib>Natarajan, Shlok</creatorcontrib><creatorcontrib>Maru, Nahum</creatorcontrib><creatorcontrib>Todorov, Hristo</creatorcontrib><creatorcontrib>Guha, Etash</creatorcontrib><creatorcontrib>Buchanan, E. Kelly</creatorcontrib><creatorcontrib>Chen, Mayee</creatorcontrib><creatorcontrib>Guha, Neel</creatorcontrib><creatorcontrib>Ré, Christopher</creatorcontrib><creatorcontrib>Mirhoseini, Azalia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Saad-Falcon, Jon</au><au>Lafuente, Adrian Gamarra</au><au>Natarajan, Shlok</au><au>Maru, Nahum</au><au>Todorov, Hristo</au><au>Guha, Etash</au><au>Buchanan, E. Kelly</au><au>Chen, Mayee</au><au>Guha, Neel</au><au>Ré, Christopher</au><au>Mirhoseini, Azalia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Archon: An Architecture Search Framework for Inference-Time Techniques</atitle><date>2024-09-23</date><risdate>2024</risdate><abstract>Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.</abstract><doi>10.48550/arxiv.2409.15254</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.15254
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_15254
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	Archon: An Architecture Search Framework for Inference-Time Techniques
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T10%3A23%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Archon:%20An%20Architecture%20Search%20Framework%20for%20Inference-Time%20Techniques&rft.au=Saad-Falcon,%20Jon&rft.date=2024-09-23&rft_id=info:doi/10.48550/arxiv.2409.15254&rft_dat=%3Carxiv_GOX%3E2409_15254%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true