Processing Multimodal User Input for Assistant Systems

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mei, Shawn C.P, Zuo, Zhengping, Natarajan, Vivek
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.