Proteus: Simulating the Performance of Distributed DNN Training
DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the perfor...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | DNN models are becoming increasingly larger to achieve unprecedented
accuracy, and the accompanying increased computation and memory requirements
necessitate the employment of massive clusters and elaborate parallelization
strategies to accelerate DNN training. In order to better optimize the
performance and analyze the cost, it is indispensable to model the training
throughput of distributed DNN training. However, complex parallelization
strategies and the resulting complex runtime behaviors make it challenging to
construct an accurate performance model. In this paper, we present Proteus, the
first standalone simulator to model the performance of complex parallelization
strategies through simulation execution. Proteus first models complex
parallelization strategies with a unified representation named Strategy Tree.
Then, it compiles the strategy tree into a distributed execution graph and
simulates the complex runtime behaviors, comp-comm overlap and bandwidth
sharing, with a Hierarchical Topo-Aware Executor (HTAE). We finally evaluate
Proteus across a wide variety of DNNs on three hardware configurations.
Experimental results show that Proteus achieves $3.0\%$ average prediction
error and preserves order for training throughput of various parallelization
strategies. Compared to state-of-the-art approaches, Proteus reduces prediction
error by up to $133.8\%$. |
---|---|
DOI: | 10.48550/arxiv.2306.02267 |