Multimodal entity and coreference resolution for assistant systems

In one embodiment, a method includes receiving, at a client system, an audio input, where the audio input comprises a coreference to a target object, accessing visual data from one or more camera associated with the client system, where the visual data comprises images portraying one or more objects...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Crook, Paul Anthony, Poddar, Shivani, Subba, Rajen, Moon, Seungwhan
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In one embodiment, a method includes receiving, at a client system, an audio input, where the audio input comprises a coreference to a target object, accessing visual data from one or more camera associated with the client system, where the visual data comprises images portraying one or more objects, resolving the coreference to the target object from among the one or more objects, resoling the target object to a specific entity, and providing, at the client system, a response to the audio input, where the response comprises information about the specific entity.