SciAgent: Tool-augmented Language Models for Scientific Reasoning

Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ma, Yubo, Gou, Zhibin, Hao, Junheng, Xu, Ruochen, Wang, Shuohang, Pan, Liangming, Yang, Yujiu, Cao, Yixin, Sun, Aixin, Awadalla, Hany, Chen, Weizhu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Ma, Yubo
Gou, Zhibin
Hao, Junheng
Xu, Ruochen
Wang, Shuohang
Pan, Liangming
Yang, Yujiu
Cao, Yixin
Sun, Aixin
Awadalla, Hany
Chen, Weizhu
description Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
doi_str_mv 10.48550/arxiv.2402.11451
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_11451</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_11451</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-b387e0bb77f23a42daf7eecd9a91a5db482b2e85d8c25811db565767f1cac6ad3</originalsourceid><addsrcrecordid>eNotj81KxDAUhbNxIaMP4GryAq29adJkZlcG_6AiaPflJrkJgU4jrSP69tbR1eFwPg58jN1AVUqjVHWL81f6LIWsRAkgFVyy9s2lNtL0sed9zmOBp3hcG3ne4RRPGIk_Z0_jwkOe-QqvYwrJ8VfCJU9pilfsIuC40PV_blh_f9cfHovu5eHp0HYFNhoKWxtNlbVaB1GjFB6DJnJ-hztA5a00wgoyyhsnlAHwVjVKNzqAQ9egrzds-3d7dhje53TE-Xv4dRnOLvUPXcBEpQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><source>arXiv.org</source><creator>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</creator><creatorcontrib>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</creatorcontrib><description>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</description><identifier>DOI: 10.48550/arxiv.2402.11451</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.11451$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.11451$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Yubo</creatorcontrib><creatorcontrib>Gou, Zhibin</creatorcontrib><creatorcontrib>Hao, Junheng</creatorcontrib><creatorcontrib>Xu, Ruochen</creatorcontrib><creatorcontrib>Wang, Shuohang</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Yang, Yujiu</creatorcontrib><creatorcontrib>Cao, Yixin</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Awadalla, Hany</creatorcontrib><creatorcontrib>Chen, Weizhu</creatorcontrib><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><description>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KxDAUhbNxIaMP4GryAq29adJkZlcG_6AiaPflJrkJgU4jrSP69tbR1eFwPg58jN1AVUqjVHWL81f6LIWsRAkgFVyy9s2lNtL0sed9zmOBp3hcG3ne4RRPGIk_Z0_jwkOe-QqvYwrJ8VfCJU9pilfsIuC40PV_blh_f9cfHovu5eHp0HYFNhoKWxtNlbVaB1GjFB6DJnJ-hztA5a00wgoyyhsnlAHwVjVKNzqAQ9egrzds-3d7dhje53TE-Xv4dRnOLvUPXcBEpQ</recordid><startdate>20240217</startdate><enddate>20240217</enddate><creator>Ma, Yubo</creator><creator>Gou, Zhibin</creator><creator>Hao, Junheng</creator><creator>Xu, Ruochen</creator><creator>Wang, Shuohang</creator><creator>Pan, Liangming</creator><creator>Yang, Yujiu</creator><creator>Cao, Yixin</creator><creator>Sun, Aixin</creator><creator>Awadalla, Hany</creator><creator>Chen, Weizhu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240217</creationdate><title>SciAgent: Tool-augmented Language Models for Scientific Reasoning</title><author>Ma, Yubo ; Gou, Zhibin ; Hao, Junheng ; Xu, Ruochen ; Wang, Shuohang ; Pan, Liangming ; Yang, Yujiu ; Cao, Yixin ; Sun, Aixin ; Awadalla, Hany ; Chen, Weizhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-b387e0bb77f23a42daf7eecd9a91a5db482b2e85d8c25811db565767f1cac6ad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Yubo</creatorcontrib><creatorcontrib>Gou, Zhibin</creatorcontrib><creatorcontrib>Hao, Junheng</creatorcontrib><creatorcontrib>Xu, Ruochen</creatorcontrib><creatorcontrib>Wang, Shuohang</creatorcontrib><creatorcontrib>Pan, Liangming</creatorcontrib><creatorcontrib>Yang, Yujiu</creatorcontrib><creatorcontrib>Cao, Yixin</creatorcontrib><creatorcontrib>Sun, Aixin</creatorcontrib><creatorcontrib>Awadalla, Hany</creatorcontrib><creatorcontrib>Chen, Weizhu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, Yubo</au><au>Gou, Zhibin</au><au>Hao, Junheng</au><au>Xu, Ruochen</au><au>Wang, Shuohang</au><au>Pan, Liangming</au><au>Yang, Yujiu</au><au>Cao, Yixin</au><au>Sun, Aixin</au><au>Awadalla, Hany</au><au>Chen, Weizhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SciAgent: Tool-augmented Language Models for Scientific Reasoning</atitle><date>2024-02-17</date><risdate>2024</risdate><abstract>Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.</abstract><doi>10.48550/arxiv.2402.11451</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.11451
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_11451
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title SciAgent: Tool-augmented Language Models for Scientific Reasoning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T11%3A08%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SciAgent:%20Tool-augmented%20Language%20Models%20for%20Scientific%20Reasoning&rft.au=Ma,%20Yubo&rft.date=2024-02-17&rft_id=info:doi/10.48550/arxiv.2402.11451&rft_dat=%3Carxiv_GOX%3E2402_11451%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true