TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Caciularu, Avi, Jacovi, Alon, Ben-David, Eyal, Goldshtein, Sasha, Schuster, Tal, Herzig, Jonathan, Elidan, Gal, Globerson, Amir
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Caciularu, Avi Jacovi, Alon Ben-David, Eyal Goldshtein, Sasha Schuster, Tal Herzig, Jonathan Elidan, Gal Globerson, Amir
description	Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.
doi_str_mv	10.48550/arxiv.2406.03618
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_03618</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_03618</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_036183</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwNjO04GTwD3F0DrFScEwpS8xLzsxLV3DOzy3ISa1QcExPL0pNTyzJLEtVCEpNLM7PA8mWZ5ZkKHjmpeUX5QKl8vMUXCtKihKTwcyQ_PycYh4G1rTEnOJUXijNzSDv5hri7KELtjq-oCgzN7GoMh7khHiwE4wJqwAAi3Y8Jg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools</title><source>arXiv.org</source><creator>Caciularu, Avi ; Jacovi, Alon ; Ben-David, Eyal ; Goldshtein, Sasha ; Schuster, Tal ; Herzig, Jonathan ; Elidan, Gal ; Globerson, Amir</creator><creatorcontrib>Caciularu, Avi ; Jacovi, Alon ; Ben-David, Eyal ; Goldshtein, Sasha ; Schuster, Tal ; Herzig, Jonathan ; Elidan, Gal ; Globerson, Amir</creatorcontrib><description>Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.</description><identifier>DOI: 10.48550/arxiv.2406.03618</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.03618$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.03618$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Caciularu, Avi</creatorcontrib><creatorcontrib>Jacovi, Alon</creatorcontrib><creatorcontrib>Ben-David, Eyal</creatorcontrib><creatorcontrib>Goldshtein, Sasha</creatorcontrib><creatorcontrib>Schuster, Tal</creatorcontrib><creatorcontrib>Herzig, Jonathan</creatorcontrib><creatorcontrib>Elidan, Gal</creatorcontrib><creatorcontrib>Globerson, Amir</creatorcontrib><title>TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools</title><description>Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwNjO04GTwD3F0DrFScEwpS8xLzsxLV3DOzy3ISa1QcExPL0pNTyzJLEtVCEpNLM7PA8mWZ5ZkKHjmpeUX5QKl8vMUXCtKihKTwcyQ_PycYh4G1rTEnOJUXijNzSDv5hri7KELtjq-oCgzN7GoMh7khHiwE4wJqwAAi3Y8Jg</recordid><startdate>20240605</startdate><enddate>20240605</enddate><creator>Caciularu, Avi</creator><creator>Jacovi, Alon</creator><creator>Ben-David, Eyal</creator><creator>Goldshtein, Sasha</creator><creator>Schuster, Tal</creator><creator>Herzig, Jonathan</creator><creator>Elidan, Gal</creator><creator>Globerson, Amir</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240605</creationdate><title>TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools</title><author>Caciularu, Avi ; Jacovi, Alon ; Ben-David, Eyal ; Goldshtein, Sasha ; Schuster, Tal ; Herzig, Jonathan ; Elidan, Gal ; Globerson, Amir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_036183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Caciularu, Avi</creatorcontrib><creatorcontrib>Jacovi, Alon</creatorcontrib><creatorcontrib>Ben-David, Eyal</creatorcontrib><creatorcontrib>Goldshtein, Sasha</creatorcontrib><creatorcontrib>Schuster, Tal</creatorcontrib><creatorcontrib>Herzig, Jonathan</creatorcontrib><creatorcontrib>Elidan, Gal</creatorcontrib><creatorcontrib>Globerson, Amir</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Caciularu, Avi</au><au>Jacovi, Alon</au><au>Ben-David, Eyal</au><au>Goldshtein, Sasha</au><au>Schuster, Tal</au><au>Herzig, Jonathan</au><au>Elidan, Gal</au><au>Globerson, Amir</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools</atitle><date>2024-06-05</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we analyze model performance across three components: table-generation, Pandas command-generation, and execution. Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as IE as a tool. Specifically, we propose to add "tools" for each of the above steps, and implement each such tool with few-shot prompting. This approach shows an improvement over existing prompting techniques, offering a promising direction for enhancing model capabilities in these tasks.</abstract><doi>10.48550/arxiv.2406.03618</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2406.03618
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2406_03618
source	arXiv.org
subjects	Computer Science - Computation and Language
title	TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T01%3A37%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TACT:%20Advancing%20Complex%20Aggregative%20Reasoning%20with%20Information%20Extraction%20Tools&rft.au=Caciularu,%20Avi&rft.date=2024-06-05&rft_id=info:doi/10.48550/arxiv.2406.03618&rft_dat=%3Carxiv_GOX%3E2406_03618%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true