Processing Multimodal User Input for Assistant Systems

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mei, Shawn C.P, Zuo, Zhengping, Natarajan, Vivek
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING MUSICAL INSTRUMENTS OPTICAL ELEMENTS, SYSTEMS, OR APPARATUS OPTICS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.