SLICE BY SLICE AI/ML MODEL INFERENCE OVER COMMUNICATION NETWORKS
In one implementation, the AI/ML model is first split into several unitary chunks that correspond to sub-parts of the model. Then an aggregation of unitary chunks is made by considering the download time, inference time of unitary chunks, and/or device constraints. The first split corresponds to a f...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In one implementation, the AI/ML model is first split into several unitary chunks that correspond to sub-parts of the model. Then an aggregation of unitary chunks is made by considering the download time, inference time of unitary chunks, and/or device constraints. The first split corresponds to a first chunk of AI/ML layers that, once downloaded, is useable as is, and generates intermediate results based on some sensing/perception data. As soon as a new chunk arrives, it is used to generate new results based on the intermediate data of the previous chunk. Since download and inference are parallelized, a final result can be generated earlier than with the full sequential method. In addition, as soon as the inference ends on a chunk, this chunk may be removed from the device. Several AI/ML model split methods are provided to generate model subsets/chunks for different model architectures. |
---|