Multimodal Unsupervised Domain Adaptation for Predicting Speaker Characteristics from Video

Persuasion and expertise are two central skills of a speaker. In this work, we propose a multimodal unsupervised domain adaptation method to predict the persuasiveness and expertise of the speaker in a video. The proposed approach uses MAG-BERT for modeling the multimodal feature space and the adver...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SN computer science 2024-06, Vol.5 (5), p.531, Article 531
Hauptverfasser:	Thomas, Chinchu, Udhayanan, Prateksha, Yadav, Ayush, Purvaj, Seethamraju, Jayagopi, Dinesh Babu
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptation Advances on Image Processing and Vision Engineering Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Datasets Deep learning Distance learning Distillation Education Generative adversarial networks Information Systems and Communication Service Knowledge management Machine learning Motion pictures Multimedia Original Research Pattern Recognition and Graphics Social networks Software Engineering/Programming and Operating Systems Student participation Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Persuasion and expertise are two central skills of a speaker. In this work, we propose a multimodal unsupervised domain adaptation method to predict the persuasiveness and expertise of the speaker in a video. The proposed approach uses MAG-BERT for modeling the multimodal feature space and the adversarial discriminative domain adaptation approach, which is a generalized approach for unsupervised domain adaptation. To reduce the domain shift, knowledge distillation is added to the adversarial adaptation. We explore two methods in an adversarial framework; the standard generative adversarial networks (GAN) loss and the Wasserstein Generative Adversarial Network (WGAN) loss, and we also explore two methods for incorporating more domain knowledge: knowledge distillation and probabilistic knowledge transfer. The experiments demonstrate that the proposed approach is promising for predicting the persuasiveness and expertise of speakers in videos using multimodal data. The best set of results for persuasiveness prediction using multimodal data is with the MAG-BERT WGAN model with an accuracy of 68% and an f1-score of 0.81 and with the BERT WGAN model with an accuracy of 75% and an f1-score of 0.81. In the case of expertise prediction, the MAG-BERT WGAN model resulted in an accuracy of 57% and an f1-score of 0.63.
ISSN:	2661-8907 2662-995X 2661-8907
DOI:	10.1007/s42979-024-02723-6