Assessing kinetic meaning of music and dance via deep cross-modal retrieval

Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2021-11, Vol.33 (21), p.14481-14493
Hauptverfasser: Raposo, Francisco Afonso, Martins de Matos, David, Ribeiro, Ricardo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance as an overt expression of semantic aspects of music related to motor intention, in an artificial deep recurrent neural network that learns correlations between music audio and dance video. We claim that, just like human semantic cognition is based on multimodal statistical structures, joint statistical modeling of music and dance artifacts is expected to capture semantics of these modalities. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task, including dance styles in an unsupervised fashion. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and demonstrate the model can recommend music audio for dance video queries and vice versa.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-021-06090-8