Speaker Intimacy Estimation in Chat-Talks Based on Verbal and Non-Verbal Information
Conversations based on mutual intimacy are critical for maintaining positive relationships. A detailed understanding of speaker relationships in dialogues enhances various applications, such as information recommendation systems. Such systems, when interacting with multiple users, can provide more t...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.184592-184606 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conversations based on mutual intimacy are critical for maintaining positive relationships. A detailed understanding of speaker relationships in dialogues enhances various applications, such as information recommendation systems. Such systems, when interacting with multiple users, can provide more tailored information by understanding the users' relationships. Furthermore, dialogue systems, which are becoming increasingly prevalent in society, can foster long-term user engagement by recognizing and responding to the intimacy levels of users. This study explores a method for estimating the intimacy levels of speakers and dialogue partners in conversational exchanges. Our approach utilizes a multimodal corpus of natural conversations with 71 Japanese participants, complete with metadata indicating each speaker's perceived intimacy level. We identified key features for estimating intimacy by analyzing the statistical parameters of these features. Our comprehensive analysis encompassed both verbal and non-verbal information, including prosody, gestures, and facial expressions. The proposed intimacy estimation model combines multimodal features using a multi-stream Bi-directional Long Short-Term Memory (BLSTM) network and grasps the contextual information of conversations with a Context BLSTM. Our model's effectiveness is demonstrated through comparisons with several baseline models. Experimental results show that our proposed model significantly improves the overall performance compared with other models. Although the RoBERTa-based method (the best baseline model) achieved an F1 score of 0.571, our method had an F1 score of 0.594. In particular, an ablation study shows that combining verbal and non-verbal features is useful for intimacy estimation. The performance was further improved by extending the dialogue context, showing that the proposed model can estimate three levels of intimacy with an F1 score of 0.666 by observing eight utterance exchanges. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3507945 |