SciAgent: Tool-augmented Language Models for Scientific Reasoning

Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shif...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ma, Yubo, Gou, Zhibin, Hao, Junheng, Xu, Ruochen, Wang, Shuohang, Pan, Liangming, Yang, Yujiu, Cao, Yixin, Sun, Aixin, Awadalla, Hany, Chen, Weizhu
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ma, Yubo Gou, Zhibin Hao, Junheng Xu, Ruochen Wang, Shuohang Pan, Liangming Yang, Yujiu Cao, Yixin Sun, Aixin Awadalla, Hany Chen, Weizhu
description	Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
doi_str_mv	10.48550/arxiv.2402.11451
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_11451</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_11451</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-b387e0bb77f23a42daf7eecd9a91a5db482b2e85d8c25811db565767f1cac6ad3</originalsourceid><addsrcrecordid>eNotj81KxDAUhbNxIaMP4GryAq29adJkZlcG_6AiaPflJrkJgU4jrSP69tbR1eFwPg58jN1AVUqjVHWL81f6LIWsRAkgFVyy9s2lNtL0sed9zmOBp3hcG3ne4RRPGIk_Z0_jwkOe-QqvYwrJ8VfCJU9pilfsIuC40PV_blh_f9cfHovu5eHp0HYFNhoKWxtNlbVaB1GjFB6DJnJ-hztA5a00wgoyyhsnlAHwVjVKNzqAQ9egrzds-3d7dhje53TE-Xv4dRnOLvUPXcBEpQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><source>arXiv.org</source><creator>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</creator><creatorcontrib>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</creatorcontrib><description>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</description><identifier>DOI: 10.48550/arxiv.2402.11451</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.11451$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.11451$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Yubo</creatorcontrib><creatorcontrib>Gou, Zhibin</creatorcontrib><creatorcontrib>Hao, Junheng</creatorcontrib><creatorcontrib>Xu, Ruochen</creatorcontrib><creatorcontrib>Wang, Shuohang</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Yang, Yujiu</creatorcontrib><creatorcontrib>Cao, Yixin</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Awadalla, Hany</creatorcontrib><creatorcontrib>Chen, Weizhu</creatorcontrib><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><description>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KxDAUhbNxIaMP4GryAq29adJkZlcG_6AiaPflJrkJgU4jrSP69tbR1eFwPg58jN1AVUqjVHWL81f6LIWsRAkgFVyy9s2lNtL0sed9zmOBp3hcG3ne4RRPGIk_Z0_jwkOe-QqvYwrJ8VfCJU9pilfsIuC40PV_blh_f9cfHovu5eHp0HYFNhoKWxtNlbVaB1GjFB6DJnJ-hztA5a00wgoyyhsnlAHwVjVKNzqAQ9egrzds-3d7dhje53TE-Xv4dRnOLvUPXcBEpQ</recordid><startdate>20240217</startdate><enddate>20240217</enddate><creator>Ma, Yubo</creator><creator>Gou, Zhibin</creator><creator>Hao, Junheng</creator><creator>Xu, Ruochen</creator><creator>Wang, Shuohang</creator><creator>Pan, Liangming</creator><creator>Yang, Yujiu</creator><creator>Cao, Yixin</creator><creator>Sun, Aixin</creator><creator>Awadalla, Hany</creator><creator>Chen, Weizhu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240217</creationdate><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><author>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-b387e0bb77f23a42daf7eecd9a91a5db482b2e85d8c25811db565767f1cac6ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Yubo</creatorcontrib><creatorcontrib>Gou, Zhibin</creatorcontrib><creatorcontrib>Hao, Junheng</creatorcontrib><creatorcontrib>Xu, Ruochen</creatorcontrib><creatorcontrib>Wang, Shuohang</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Yang, Yujiu</creatorcontrib><creatorcontrib>Cao, Yixin</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Awadalla, Hany</creatorcontrib><creatorcontrib>Chen, Weizhu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, Yubo</au><au>Gou, Zhibin</au><au>Hao, Junheng</au><au>Xu, Ruochen</au><au>Wang, Shuohang</au><au>Pan, Liangming</au><au>Yang, Yujiu</au><au>Cao, Yixin</au><au>Sun, Aixin</au><au>Awadalla, Hany</au><au>Chen, Weizhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SciAgent: Tool-augmented Language Models for Scientific Reasoning</atitle><date>2024-02-17</date><risdate>2024</risdate><abstract>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</abstract><doi>10.48550/arxiv.2402.11451</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2402.11451
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2402_11451
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	SciAgent: Tool-augmented Language Models for Scientific Reasoning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T11%3A08%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SciAgent:%20Tool-augmented%20Language%20Models%20for%20Scientific%20Reasoning&rft.au=Ma,%20Yubo&rft.date=2024-02-17&rft_id=info:doi/10.48550/arxiv.2402.11451&rft_dat=%3Carxiv_GOX%3E2402_11451%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true