Localization of Objects Encoded in Image Data in Accordance with Natural Language Queries

Generally, the disclosure is directed to generalized objected location, where the located object is in accordance to a natural language (NL) query. More specifically, the embodiments include a unified generalized visual localization architecture. The architecture achieves enhanced performance on the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Saffar, Mohammad Taghi, Bertsch, Fred, Piergiovanni, Anthony J, Li, Wei, Angelova, Anelia, Kuo, Wei-Cheng
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Generally, the disclosure is directed to generalized objected location, where the located object is in accordance to a natural language (NL) query. More specifically, the embodiments include a unified generalized visual localization architecture. The architecture achieves enhanced performance on the following three tasks: referring expression comprehension, object localization, and object detection. The embodiments employ machine-learned NL models and/or image models. The architecture is enabled to understand and answer natural localization questions towards an image, to output multiple boxes, provide no output if the object is not present (e.g., a null result), as well as, solve general detection tasks.