Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity
While BERT produces high-quality sentence embeddings, its pre-training computational cost is a significant drawback. In contrast, ELECTRA provides a cost-effective pre-training objective and downstream task performance improvements, but worse sentence embeddings. The community tacitly stopped utiliz...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While BERT produces high-quality sentence embeddings, its pre-training
computational cost is a significant drawback. In contrast, ELECTRA provides a
cost-effective pre-training objective and downstream task performance
improvements, but worse sentence embeddings. The community tacitly stopped
utilizing ELECTRA's sentence embeddings for semantic textual similarity (STS).
We notice a significant drop in performance for the ELECTRA discriminator's
last layer in comparison to prior layers. We explore this drop and propose a
way to repair the embeddings using a novel truncated model fine-tuning (TMFT)
method. TMFT improves the Spearman correlation coefficient by over $8$ points
while increasing parameter efficiency on the STS Benchmark. We extend our
analysis to various model sizes, languages, and two other tasks. Further, we
discover the surprising efficacy of ELECTRA's generator model, which performs
on par with BERT, using significantly fewer parameters and a substantially
smaller embedding size. Finally, we observe boosts by combining TMFT with word
similarity or domain adaptive pre-training. |
---|---|
DOI: | 10.48550/arxiv.2402.13130 |