Method and device for processing multi-modal data and robot
A method for processing multi-modal data comprises the following steps: acquiring a depth image, and acquiring spatial position information of each user according to the depth image; acquiring audio data, extracting voiceprint feature information of different users from the audio data, positioning a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for processing multi-modal data comprises the following steps: acquiring a depth image, and acquiring spatial position information of each user according to the depth image; acquiring audio data, extracting voiceprint feature information of different users from the audio data, positioning a speaker according to the voiceprint feature information, and acquiring sound field positioning information of the corresponding user; and associating the spatial position information with the sound field positioning information, and associating the voiceprint feature information of different users with corresponding users. The invention further provides a device for processing the multi-modal data and the robot. According to the method provided by the embodiment of the invention, the perception and interaction effects are improved through fusion and comprehensive decision making of the multi-modal data, and more information can be provided for online model decision making, so that the accuracy of an overall decisi |
---|