SLICE BY SLICE AI/ML MODEL INFERENCE OVER COMMUNICATION NETWORKS

In one implementation, the AI/ML model is first split into several unitary chunks that correspond to sub-parts of the model. Then an aggregation of unitary chunks is made by considering the download time, inference time of unitary chunks, and/or device constraints. The first split corresponds to a f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: FILOCHE, Thierry, QUINQUIS, Cyril, LE GUYADEC, Pascal, FONTAINE, Patrick, LAMBERT, Anne, SCHNITZLER, Francois
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In one implementation, the AI/ML model is first split into several unitary chunks that correspond to sub-parts of the model. Then an aggregation of unitary chunks is made by considering the download time, inference time of unitary chunks, and/or device constraints. The first split corresponds to a first chunk of AI/ML layers that, once downloaded, is useable as is, and generates intermediate results based on some sensing/perception data. As soon as a new chunk arrives, it is used to generate new results based on the intermediate data of the previous chunk. Since download and inference are parallelized, a final result can be generated earlier than with the full sequential method. In addition, as soon as the inference ends on a chunk, this chunk may be removed from the device. Several AI/ML model split methods are provided to generate model subsets/chunks for different model architectures.