Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper extends recent work on nonlinear Independent Component Analysis
(ICA) by introducing a theoretical framework for nonlinear Independent Subspace
Analysis (ISA) in the presence of auxiliary variables. Observed high
dimensional acoustic features like log Mel spectrograms can be considered as
surface level manifestations of nonlinear transformations over individual
multivariate sources of information like speaker characteristics, phonological
content etc. Under assumptions of energy based models we use the theory of
nonlinear ISA to propose an algorithm that learns unsupervised speech
representations whose subspaces are independent and potentially highly
correlated with the original non-stationary multivariate sources. We show how
nonlinear ICA with auxiliary variables can be extended to a generic
identifiable model for subspaces as well while also providing sufficient
conditions for the identifiability of these high dimensional subspaces. Our
proposed methodology is generic and can be integrated with standard
unsupervised approaches to learn speech representations with subspaces that can
theoretically capture independent higher order speech signals. We evaluate the
gains of our algorithm when integrated with the Autoregressive Predictive
Decoding (APC) model by showing empirical results on the speaker verification
and phoneme recognition tasks. |
---|---|
DOI: | 10.48550/arxiv.2007.12948 |