Speech Representation Analysis based on Inter- and Intra-Model Similarities
Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundatio...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Self-supervised models have revolutionized speech processing, achieving new
levels of performance in a wide variety of tasks with limited resources.
However, the inner workings of these models are still opaque. In this paper, we
aim to analyze the encoded contextual representation of these foundation models
based on their inter- and intra-model similarity, independent of any external
annotation and task-specific constraint. We examine different SSL models
varying their training paradigm -- Contrastive (Wav2Vec2.0) and Predictive
models (HuBERT); and model sizes (base and large). We explore these models on
different levels of localization/distributivity of information including (i)
individual neurons; (ii) layer representation; (iii) attention weights and (iv)
compare the representations with their finetuned counterparts.Our results
highlight that these models converge to similar representation subspaces but
not to similar neuron-localized concepts\footnote{A concept represents a
coherent fragment of knowledge, such as ``a class containing certain objects as
elements, where the objects have certain properties. We made the code publicly
available for facilitating further research, we publicly released our code. |
---|---|
DOI: | 10.48550/arxiv.2406.16099 |