BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kazemi, Mehran, Yuan, Quan, Bhatia, Deepti, Kim, Najoung, Xu, Xin, Imbrasaite, Vaiva, Ramachandran, Deepak
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Kazemi, Mehran Yuan, Quan Bhatia, Deepti Kim, Najoung Xu, Xin Imbrasaite, Vaiva Ramachandran, Deepak
description	Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.
doi_str_mv	10.48550/arxiv.2306.07934
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_07934</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_07934</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-863b2cb8323f6861e1a07899fa412d1a3134113e303083d9b8cecdbf1c468503</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvDKhwAUz4BhLsfI7jsIXwVykCAd2jL7EdLDU2clygdw8tTGd6j_QQcsFZLlRZsiuM3-4zL4DJnFU1iFOyuQkY9YSzeWmuaUNvMeFiErUh0idMu4hb2qGfdjgZ-mpwCd75iX659E7b4FNE7cYU4p6u_W8zY3LBn5ETi9vFnP_virzd323ax6x7fli3TZehrESmJAzFOCgowEolueHIKlXXFgUvNEfgIDgHAwyYAl0PajSjHiwfhVQlgxW5_Hs9qvqP6GaM-_6g6486-AHZ1Eng</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information</title><source>arXiv.org</source><creator>Kazemi, Mehran ; Yuan, Quan ; Bhatia, Deepti ; Kim, Najoung ; Xu, Xin ; Imbrasaite, Vaiva ; Ramachandran, Deepak</creator><creatorcontrib>Kazemi, Mehran ; Yuan, Quan ; Bhatia, Deepti ; Kim, Najoung ; Xu, Xin ; Imbrasaite, Vaiva ; Ramachandran, Deepak</creatorcontrib><description>Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.</description><identifier>DOI: 10.48550/arxiv.2306.07934</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.07934$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.07934$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kazemi, Mehran</creatorcontrib><creatorcontrib>Yuan, Quan</creatorcontrib><creatorcontrib>Bhatia, Deepti</creatorcontrib><creatorcontrib>Kim, Najoung</creatorcontrib><creatorcontrib>Xu, Xin</creatorcontrib><creatorcontrib>Imbrasaite, Vaiva</creatorcontrib><creatorcontrib>Ramachandran, Deepak</creatorcontrib><title>BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information</title><description>Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvDKhwAUz4BhLsfI7jsIXwVykCAd2jL7EdLDU2clygdw8tTGd6j_QQcsFZLlRZsiuM3-4zL4DJnFU1iFOyuQkY9YSzeWmuaUNvMeFiErUh0idMu4hb2qGfdjgZ-mpwCd75iX659E7b4FNE7cYU4p6u_W8zY3LBn5ETi9vFnP_virzd323ax6x7fli3TZehrESmJAzFOCgowEolueHIKlXXFgUvNEfgIDgHAwyYAl0PajSjHiwfhVQlgxW5_Hs9qvqP6GaM-_6g6486-AHZ1Eng</recordid><startdate>20230613</startdate><enddate>20230613</enddate><creator>Kazemi, Mehran</creator><creator>Yuan, Quan</creator><creator>Bhatia, Deepti</creator><creator>Kim, Najoung</creator><creator>Xu, Xin</creator><creator>Imbrasaite, Vaiva</creator><creator>Ramachandran, Deepak</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230613</creationdate><title>BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information</title><author>Kazemi, Mehran ; Yuan, Quan ; Bhatia, Deepti ; Kim, Najoung ; Xu, Xin ; Imbrasaite, Vaiva ; Ramachandran, Deepak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-863b2cb8323f6861e1a07899fa412d1a3134113e303083d9b8cecdbf1c468503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kazemi, Mehran</creatorcontrib><creatorcontrib>Yuan, Quan</creatorcontrib><creatorcontrib>Bhatia, Deepti</creatorcontrib><creatorcontrib>Kim, Najoung</creatorcontrib><creatorcontrib>Xu, Xin</creatorcontrib><creatorcontrib>Imbrasaite, Vaiva</creatorcontrib><creatorcontrib>Ramachandran, Deepak</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kazemi, Mehran</au><au>Yuan, Quan</au><au>Bhatia, Deepti</au><au>Kim, Najoung</au><au>Xu, Xin</au><au>Imbrasaite, Vaiva</au><au>Ramachandran, Deepak</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information</atitle><date>2023-06-13</date><risdate>2023</risdate><abstract>Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.</abstract><doi>10.48550/arxiv.2306.07934</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2306.07934
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2306_07934
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T10%3A06%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BoardgameQA:%20A%20Dataset%20for%20Natural%20Language%20Reasoning%20with%20Contradictory%20Information&rft.au=Kazemi,%20Mehran&rft.date=2023-06-13&rft_id=info:doi/10.48550/arxiv.2306.07934&rft_dat=%3Carxiv_GOX%3E2306_07934%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true