Machine learning models for video object segmentation

Training a global machine learning (ML) model to perform video object segmentation and video object(s) tracking comprises transmitting (S200) a pre-trained global ML model comprising high- and low-resolution configurations to user devices, along with a request (S202) for each to train the global mod...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Albert Saá-Garriga, Roy Miles, Mehmet Yucel, Moonhwan Jeong, Bruno Manganelli
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING IMAGE DATA PROCESSING OR GENERATION, IN GENERAL PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Training a global machine learning (ML) model to perform video object segmentation and video object(s) tracking comprises transmitting (S200) a pre-trained global ML model comprising high- and low-resolution configurations to user devices, along with a request (S202) for each to train the global model using local device training datasets. The locally-trained models are received (S204) from the user devices for aggregation, the locally-trained models comprising high- and low-resolution configurations. The received high resolution locally-trained model configurations are combined and the low resolution locally-trained model configurations are combined to generate a new global ML model (S206). The trained model may segment and track an object (e.g. fig. 16) even when the object changes shape, orientation, position, proximity and angle to a camera that captured the video etc. The best locally-trained ML model may be combined with the pre-trained global ML model, the combination being compared to the global model, and replacing the global model if better. Also disclosed is ML model training comprising using a local training dataset and segmentation masks to train a global ML model with high- and low-resolution configurations by reducing differences between an upscaled low-resolution segmentation and a high-resolution segmentation and applying knowledge distillation loss.