Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words
The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-base...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The performance of sentence encoders can be significantly improved through
the simple practice of fine-tuning using contrastive loss. A natural question
arises: what characteristics do models acquire during contrastive learning?
This paper theoretically and experimentally shows that contrastive-based
sentence encoders implicitly weight words based on information-theoretic
quantities; that is, more informative words receive greater weight, while
others receive less. The theory states that, in the lower bound of the optimal
value of the contrastive learning objective, the norm of word embedding
reflects the information gain associated with the distribution of surrounding
words. We also conduct comprehensive experiments using various models, multiple
datasets, two methods to measure the implicit weighting of models (Integrated
Gradients and SHAP), and two information-theoretic quantities (information gain
and self-information). The results provide empirical evidence that contrastive
fine-tuning emphasizes informative words. |
---|---|
DOI: | 10.48550/arxiv.2310.15921 |