Language Models and World Knowledge: Injecting structured information using masked language modeling and adapters

Combining structured information with language models is a standing problem in NLP. Building on previous work, we study how lightweight neural networks, known as adapters, can be used to inject information from a knowledge graph into two popular pre-trained language models based on the transformer a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Wold, Sondre
Format: Dissertation
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Combining structured information with language models is a standing problem in NLP. Building on previous work, we study how lightweight neural networks, known as adapters, can be used to inject information from a knowledge graph into two popular pre-trained language models based on the transformer architecture. The adapters are trained using the masked language modeling objective over extracted triples from ConceptNet, a knowledge graph that captures a range of world knowledge and commonsense concepts and relations. Experiments on three popular NLP benchmarks believed to require world knowledge and commonsense reasoning abilities show that the adapter injection does not increase performance on these tasks. However, probing experiments indicate that the injected models are better at recovering factual information seen during training, and that this can be achieved by introducing a small amount of additional parameters to the overall model. Ablation studies show that the injected knowledge is distributed equally among the layers in the underlying model. Furthermore, using the AdapterFusion framework, we propose and perform initial testing of a two-step learning algorithm that partitions ConceptNet by predicate type and trains a set of disjoint adapters that are later combined using an attention mechanism. For reproducibility, we present a reproduction of the most related previous work and release our code.