How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy
It is widely accepted that fine-tuning pre-trained language models usually brings about performance improvements in downstream tasks. However, there are limited studies on the reasons behind this effectiveness, particularly from the viewpoint of structural changes in the embedding space. Trying to f...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It is widely accepted that fine-tuning pre-trained language models usually
brings about performance improvements in downstream tasks. However, there are
limited studies on the reasons behind this effectiveness, particularly from the
viewpoint of structural changes in the embedding space. Trying to fill this
gap, in this paper, we analyze the extent to which the isotropy of the
embedding space changes after fine-tuning. We demonstrate that, even though
isotropy is a desirable geometrical property, fine-tuning does not necessarily
result in isotropy enhancements. Moreover, local structures in pre-trained
contextual word representations (CWRs), such as those encoding token types or
frequency, undergo a massive change during fine-tuning. Our experiments show
dramatic growth in the number of elongated directions in the embedding space,
which, in contrast to pre-trained CWRs, carry the essential linguistic
knowledge in the fine-tuned embedding space, making existing isotropy
enhancement methods ineffective. |
---|---|
DOI: | 10.48550/arxiv.2109.04740 |