Dataflow Execution Time Estimation for In-Memory Distributed Processing Framework

Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given ope...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Costa, Bruno Carlos da Cunha, Ciarlini, Angelo E. M, Gottin, Vinícius Michel, Condori, Edward José Pacheco, Dias, Jonas F, Pires, Paulo de Figueiredo, Vieira, Wagner dos Santos, Souto, Yania Molina, Porto, Fábio André Machado
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given operation; estimating a number of rounds required to execute a number of data units in the input dataset using nodes assigned to execute the given operation; determining an execution time spent by the given operation to process one data unit; estimating the execution time for the given operation based on the execution time spent by the given operation to process one data unit and the number of rounds required to execute the number of data units in the input dataset; and executing the given operation with the input dataset. A persistent cost model is optionally employed to record the execution times of known dataflow operations.