Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Proceedings of the ICML 2022 Expressive Vocalizations Workshop and
Competition: Recognizing, Generating, and Personalizing Vocal Bursts The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses
on understanding the emotional facets of the non-linguistic vocalizations
(vocal bursts (VB)). The objective of this challenge is to predict emotional
intensities for VB, being a multi-task challenge it also requires to predict
speakers' age and native-country. For this challenge we study and compare two
distinct embedding spaces namely, self-supervised learning (SSL) based
embeddings and task-specific supervised learning based embeddings. Towards
that, we investigate feature representations obtained from several pre-trained
SSL neural networks and task-specific supervised classification neural
networks. Our studies show that the best performance is obtained with a hybrid
approach, where predictions derived via both SSL and task-specific supervised
learning are used. Our best system on test-set surpasses the ComPARE baseline
(harmonic mean of all sub-task scores i.e., $S_{MTL}$) by a relative $13\%$
margin. |
---|---|
DOI: | 10.48550/arxiv.2206.11968 |