Method and device for processing multi-modal data and robot

A method for processing multi-modal data comprises the following steps: acquiring a depth image, and acquiring spatial position information of each user according to the depth image; acquiring audio data, extracting voiceprint feature information of different users from the audio data, positioning a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	DING LEI, CHEN FANG, DENG QICHUN, ZHANG YONGJIE
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING HANDLING RECORD CARRIERS IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method for processing multi-modal data comprises the following steps: acquiring a depth image, and acquiring spatial position information of each user according to the depth image; acquiring audio data, extracting voiceprint feature information of different users from the audio data, positioning a speaker according to the voiceprint feature information, and acquiring sound field positioning information of the corresponding user; and associating the spatial position information with the sound field positioning information, and associating the voiceprint feature information of different users with corresponding users. The invention further provides a device for processing the multi-modal data and the robot. According to the method provided by the embodiment of the invention, the perception and interaction effects are improved through fusion and comprehensive decision making of the multi-modal data, and more information can be provided for online model decision making, so that the accuracy of an overall decisi