Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech and visual tasks are both continuous, so it is natur...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Self-supervised visual pretraining has shown significant progress recently.
Among those methods, SimCLR greatly advanced the state of the art in
self-supervised and semi-supervised learning on ImageNet. The input feature
representations for speech and visual tasks are both continuous, so it is
natural to consider applying similar objective on speech representation
learning. In this paper, we propose Speech SimCLR, a new self-supervised
objective for speech representation learning. During training, Speech SimCLR
applies augmentation on raw speech and its spectrogram. Its objective is the
combination of contrastive loss that maximizes agreement between differently
augmented samples in the latent space and reconstruction loss of input
representation. The proposed method achieved competitive results on speech
emotion recognition and speech recognition. |
---|---|
DOI: | 10.48550/arxiv.2010.13991 |