KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models
Previous works show the great potential of pre-trained language models (PLMs) for storing a large amount of factual knowledge. However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLM...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Gao, Daniel Jia, Yantao Li, Lei Fu, Chengzhen Dou, Zhicheng Jiang, Hao Zhang, Xinyu Chen, Lei Cao, Zhao |
description | Previous works show the great potential of pre-trained language models (PLMs)
for storing a large amount of factual knowledge. However, to figure out whether
PLMs can be reliable knowledge sources and used as alternative knowledge bases
(KBs), we need to further explore some critical features of PLMs. Firstly,
knowledge memorization and identification abilities: traditional KBs can store
various types of entities and relationships; do PLMs have a high knowledge
capacity to store different types of knowledge? Secondly, reasoning ability: a
qualified knowledge source should not only provide a collection of facts, but
support a symbolic reasoner. Can PLMs derive new knowledge based on the
correlations between facts? To evaluate these features of PLMs, we propose a
benchmark, named Knowledge Memorization, Identification, and Reasoning test
(KMIR). KMIR covers 3 types of knowledge, including general knowledge,
domain-specific knowledge, and commonsense, and provides 184,348 well-designed
questions. Preliminary experiments with various representative pre-training
language models on KMIR reveal many interesting phenomenons: 1) The
memorization ability of PLMs depends more on the number of parameters than
training schemes. 2) Current PLMs are struggling to robustly remember the
facts. 3) Model compression technology retains the amount of knowledge well,
but hurts the identification and reasoning abilities. We hope KMIR can
facilitate the design of PLMs as better knowledge sources. |
doi_str_mv | 10.48550/arxiv.2202.13529 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2202_13529</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2202_13529</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-9d9e1be9176cc4db9673e96b6db6c4fa3e538868b59f6721028c498535ea4b723</originalsourceid><addsrcrecordid>eNotj8tOwzAURLNhgQofwAp_AAmJHTs2u1AViNoKqeo-8uM6WKQ2ctLy-HqawGo0I82RTpLcFHlWckrzexm_3CnDOMdZQSgWl0lcb5vdA6rRI3j9dpDxHdkQ0eok-6Mcne_Q2ofPHkwHaAuHEN3PeQ7-DjUG_Ois03NH0hu0AzkEP51q5Xo3OhhQsGgjfXeUEyAY6Ier5MLKfoDr_1wk-6fVfvmSbl6fm2W9SSWrRCqMgEKBKCqmdWmUYBUBwRQziunSSgKUcM64osKyChc55roUnBIKslQVJovk9g87S7cf0Z3tvttJvp3lyS_-rlUx</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models</title><source>arXiv.org</source><creator>Gao, Daniel ; Jia, Yantao ; Li, Lei ; Fu, Chengzhen ; Dou, Zhicheng ; Jiang, Hao ; Zhang, Xinyu ; Chen, Lei ; Cao, Zhao</creator><creatorcontrib>Gao, Daniel ; Jia, Yantao ; Li, Lei ; Fu, Chengzhen ; Dou, Zhicheng ; Jiang, Hao ; Zhang, Xinyu ; Chen, Lei ; Cao, Zhao</creatorcontrib><description>Previous works show the great potential of pre-trained language models (PLMs)
for storing a large amount of factual knowledge. However, to figure out whether
PLMs can be reliable knowledge sources and used as alternative knowledge bases
(KBs), we need to further explore some critical features of PLMs. Firstly,
knowledge memorization and identification abilities: traditional KBs can store
various types of entities and relationships; do PLMs have a high knowledge
capacity to store different types of knowledge? Secondly, reasoning ability: a
qualified knowledge source should not only provide a collection of facts, but
support a symbolic reasoner. Can PLMs derive new knowledge based on the
correlations between facts? To evaluate these features of PLMs, we propose a
benchmark, named Knowledge Memorization, Identification, and Reasoning test
(KMIR). KMIR covers 3 types of knowledge, including general knowledge,
domain-specific knowledge, and commonsense, and provides 184,348 well-designed
questions. Preliminary experiments with various representative pre-training
language models on KMIR reveal many interesting phenomenons: 1) The
memorization ability of PLMs depends more on the number of parameters than
training schemes. 2) Current PLMs are struggling to robustly remember the
facts. 3) Model compression technology retains the amount of knowledge well,
but hurts the identification and reasoning abilities. We hope KMIR can
facilitate the design of PLMs as better knowledge sources.</description><identifier>DOI: 10.48550/arxiv.2202.13529</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2022-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2202.13529$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2202.13529$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Daniel</creatorcontrib><creatorcontrib>Jia, Yantao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Fu, Chengzhen</creatorcontrib><creatorcontrib>Dou, Zhicheng</creatorcontrib><creatorcontrib>Jiang, Hao</creatorcontrib><creatorcontrib>Zhang, Xinyu</creatorcontrib><creatorcontrib>Chen, Lei</creatorcontrib><creatorcontrib>Cao, Zhao</creatorcontrib><title>KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models</title><description>Previous works show the great potential of pre-trained language models (PLMs)
for storing a large amount of factual knowledge. However, to figure out whether
PLMs can be reliable knowledge sources and used as alternative knowledge bases
(KBs), we need to further explore some critical features of PLMs. Firstly,
knowledge memorization and identification abilities: traditional KBs can store
various types of entities and relationships; do PLMs have a high knowledge
capacity to store different types of knowledge? Secondly, reasoning ability: a
qualified knowledge source should not only provide a collection of facts, but
support a symbolic reasoner. Can PLMs derive new knowledge based on the
correlations between facts? To evaluate these features of PLMs, we propose a
benchmark, named Knowledge Memorization, Identification, and Reasoning test
(KMIR). KMIR covers 3 types of knowledge, including general knowledge,
domain-specific knowledge, and commonsense, and provides 184,348 well-designed
questions. Preliminary experiments with various representative pre-training
language models on KMIR reveal many interesting phenomenons: 1) The
memorization ability of PLMs depends more on the number of parameters than
training schemes. 2) Current PLMs are struggling to robustly remember the
facts. 3) Model compression technology retains the amount of knowledge well,
but hurts the identification and reasoning abilities. We hope KMIR can
facilitate the design of PLMs as better knowledge sources.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURLNhgQofwAp_AAmJHTs2u1AViNoKqeo-8uM6WKQ2ctLy-HqawGo0I82RTpLcFHlWckrzexm_3CnDOMdZQSgWl0lcb5vdA6rRI3j9dpDxHdkQ0eok-6Mcne_Q2ofPHkwHaAuHEN3PeQ7-DjUG_Ois03NH0hu0AzkEP51q5Xo3OhhQsGgjfXeUEyAY6Ier5MLKfoDr_1wk-6fVfvmSbl6fm2W9SSWrRCqMgEKBKCqmdWmUYBUBwRQziunSSgKUcM64osKyChc55roUnBIKslQVJovk9g87S7cf0Z3tvttJvp3lyS_-rlUx</recordid><startdate>20220227</startdate><enddate>20220227</enddate><creator>Gao, Daniel</creator><creator>Jia, Yantao</creator><creator>Li, Lei</creator><creator>Fu, Chengzhen</creator><creator>Dou, Zhicheng</creator><creator>Jiang, Hao</creator><creator>Zhang, Xinyu</creator><creator>Chen, Lei</creator><creator>Cao, Zhao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220227</creationdate><title>KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models</title><author>Gao, Daniel ; Jia, Yantao ; Li, Lei ; Fu, Chengzhen ; Dou, Zhicheng ; Jiang, Hao ; Zhang, Xinyu ; Chen, Lei ; Cao, Zhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-9d9e1be9176cc4db9673e96b6db6c4fa3e538868b59f6721028c498535ea4b723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Daniel</creatorcontrib><creatorcontrib>Jia, Yantao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Fu, Chengzhen</creatorcontrib><creatorcontrib>Dou, Zhicheng</creatorcontrib><creatorcontrib>Jiang, Hao</creatorcontrib><creatorcontrib>Zhang, Xinyu</creatorcontrib><creatorcontrib>Chen, Lei</creatorcontrib><creatorcontrib>Cao, Zhao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Daniel</au><au>Jia, Yantao</au><au>Li, Lei</au><au>Fu, Chengzhen</au><au>Dou, Zhicheng</au><au>Jiang, Hao</au><au>Zhang, Xinyu</au><au>Chen, Lei</au><au>Cao, Zhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models</atitle><date>2022-02-27</date><risdate>2022</risdate><abstract>Previous works show the great potential of pre-trained language models (PLMs)
for storing a large amount of factual knowledge. However, to figure out whether
PLMs can be reliable knowledge sources and used as alternative knowledge bases
(KBs), we need to further explore some critical features of PLMs. Firstly,
knowledge memorization and identification abilities: traditional KBs can store
various types of entities and relationships; do PLMs have a high knowledge
capacity to store different types of knowledge? Secondly, reasoning ability: a
qualified knowledge source should not only provide a collection of facts, but
support a symbolic reasoner. Can PLMs derive new knowledge based on the
correlations between facts? To evaluate these features of PLMs, we propose a
benchmark, named Knowledge Memorization, Identification, and Reasoning test
(KMIR). KMIR covers 3 types of knowledge, including general knowledge,
domain-specific knowledge, and commonsense, and provides 184,348 well-designed
questions. Preliminary experiments with various representative pre-training
language models on KMIR reveal many interesting phenomenons: 1) The
memorization ability of PLMs depends more on the number of parameters than
training schemes. 2) Current PLMs are struggling to robustly remember the
facts. 3) Model compression technology retains the amount of knowledge well,
but hurts the identification and reasoning abilities. We hope KMIR can
facilitate the design of PLMs as better knowledge sources.</abstract><doi>10.48550/arxiv.2202.13529</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2202.13529 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2202_13529 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T05%3A19%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=KMIR:%20A%20Benchmark%20for%20Evaluating%20Knowledge%20Memorization,%20Identification%20and%20Reasoning%20Abilities%20of%20Language%20Models&rft.au=Gao,%20Daniel&rft.date=2022-02-27&rft_id=info:doi/10.48550/arxiv.2202.13529&rft_dat=%3Carxiv_GOX%3E2202_13529%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |