Parallel method based on hybrid architecture in distributed training
In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segm...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segment; allocating the plurality of data segments to a plurality of threads, wherein the plurality of threads at least comprise a first thread and a second thread; in one embodiment, an intra-node sub-operation is performed on a portion of a first data segment using a first thread, and an inter-node sub-operation is performed on a portion of a second data segment using a second thread in parallel, thus simultaneously using intra-node and inter-node link structures to reduce idle time of the intra-node and inter-node link structures.
在分布式训练中,为了节省通信开销及减少训练时间,第一计算节点可以将分配给处理单元的数据块划分成多个数据段,该多个数据段至少包括第一数据段和第二数据段;将该多个数据段分配给多个线程,该多个线程至少包括第一线程和第二线程;利用第一线程对第一数据段的一部分执行节点内子运算,且并行地利用第二线程对第二数据段的一部分执行节点间子运算,因此同时利用节点内和节点间链接结构来减少节点内和节 |
---|