Transfer fine-tuning of BERT with phrasal paraphrases

•Transfer fine-tuning yields representations suitable for specific tasks; in this paper we focused on sentence pair modelling.•The method helps the BERT model converge more quickly with a smaller corpus.•It also realizes performance gains while maintaining the model size.•Simple features outperform...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2021-03, Vol.66, p.101164, Article 101164
Hauptverfasser:	Arase, Yuki, Tsujii, Junichi
Format:	Artikel
Sprache:	eng
Schlagworte:	BERT Paraphrase Sentence pair modelling Sentence representation Transfer fine-tuning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Transfer fine-tuning yields representations suitable for specific tasks; in this paper we focused on sentence pair modelling.•The method helps the BERT model converge more quickly with a smaller corpus.•It also realizes performance gains while maintaining the model size.•Simple features outperform elaborate ones in phrasal paraphrase classification. Sentence pair modelling is defined as the task of identifying the semantic interaction between a sentence pair, i.e., paraphrase and textual entailment identification and semantic similarity measurement. It constitutes a set of crucial tasks for research in the area of natural language understanding. Sentence representation learning is a fundamental technology for sentence pair modelling, where the development of the BERT model realised a breakthrough. We have recently proposed transfer fine-tuning using phrasal paraphrases to allow BERT’s representations to be suitable for semantic equivalence assessment between sentences while maintaining the model size. Herein, we reveal that transfer fine-tuning with simplified feature generation allows us to generate representations that are widely effective across different types of sentence pair modelling tasks. Detailed analysis confirms that our transfer fine-tuning helps the BERT model converge more quickly with a smaller corpus for fine-tuning.
ISSN:	0885-2308 1095-8363
DOI:	10.1016/j.csl.2020.101164