To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging
Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Leveraging large amounts of unlabeled data using Transformer-like
architectures, like BERT, has gained popularity in recent times owing to their
effectiveness in learning general representations that can then be further
fine-tuned for downstream tasks to much success. However, training these models
can be costly both from an economic and environmental standpoint. In this work,
we investigate how to effectively use unlabeled data: by exploring the
task-specific semi-supervised approach, Cross-View Training (CVT) and comparing
it with task-agnostic BERT in multiple settings that include domain and task
relevant English data. CVT uses a much lighter model architecture and we show
that it achieves similar performance to BERT on a set of sequence tagging
tasks, with lesser financial and environmental impact. |
---|---|
DOI: | 10.48550/arxiv.2010.14042 |