Dataflow Execution Time Estimation for In-Memory Distributed Processing Framework
Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given ope...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given operation; estimating a number of rounds required to execute a number of data units in the input dataset using nodes assigned to execute the given operation; determining an execution time spent by the given operation to process one data unit; estimating the execution time for the given operation based on the execution time spent by the given operation to process one data unit and the number of rounds required to execute the number of data units in the input dataset; and executing the given operation with the input dataset. A persistent cost model is optionally employed to record the execution times of known dataflow operations. |
---|