Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset
Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP. While CSKB completion only fills the missing links within the domain of the CSKB, CSKB population is alternatively proposed with the goal of reasoning unseen assertions...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reasoning over commonsense knowledge bases (CSKB) whose elements are in the
form of free-text is an important yet hard task in NLP. While CSKB completion
only fills the missing links within the domain of the CSKB, CSKB population is
alternatively proposed with the goal of reasoning unseen assertions from
external resources. In this task, CSKBs are grounded to a large-scale
eventuality (activity, state, and event) graph to discriminate whether novel
triples from the eventuality graph are plausible or not. However, existing
evaluations on the population task are either not accurate (automatic
evaluation with randomly sampled negative examples) or of small scale (human
annotation). In this paper, we benchmark the CSKB population task with a new
large-scale dataset by first aligning four popular CSKBs, and then presenting a
high-quality human-annotated evaluation set to probe neural models' commonsense
reasoning ability. We also propose a novel inductive commonsense reasoning
model that reasons over graphs. Experimental results show that generalizing
commonsense reasoning on unseen assertions is inherently a hard task. Models
achieving high accuracy during training perform poorly on the evaluation set,
with a large gap between human performance. We will make the data publicly
available for future contributions. Codes and data are available at
https://github.com/HKUST-KnowComp/CSKB-Population. |
---|---|
DOI: | 10.48550/arxiv.2109.07679 |