SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
While many natural language inference (NLI) datasets target certain semantic phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to the best of our knowledge, there is no NLI dataset that involves diverse types of spatial expressions and reasoning. We fill this gap by se...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While many natural language inference (NLI) datasets target certain semantic
phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to
the best of our knowledge, there is no NLI dataset that involves diverse types
of spatial expressions and reasoning. We fill this gap by semi-automatically
creating an NLI dataset for spatial reasoning, called SpaceNLI. The data
samples are automatically generated from a curated set of reasoning patterns,
where the patterns are annotated with inference labels by experts. We test
several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and
the system's capacity for spatial reasoning. Moreover, we introduce a Pattern
Accuracy and argue that it is a more reliable and stricter measure than the
accuracy for evaluating a system's performance on pattern-based generated data
samples. Based on the evaluation results we find that the systems obtain
moderate results on the spatial NLI problems but lack consistency per inference
pattern. The results also reveal that non-projective spatial inferences
(especially due to the "between" preposition) are the most challenging ones. |
---|---|
DOI: | 10.48550/arxiv.2307.02269 |