Parallel method based on hybrid architecture in distributed training

In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segm...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WANG SHAOCHUANG, YE JIANXI, DONG JIANBO, RAN QIANYUAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segment; allocating the plurality of data segments to a plurality of threads, wherein the plurality of threads at least comprise a first thread and a second thread; in one embodiment, an intra-node sub-operation is performed on a portion of a first data segment using a first thread, and an inter-node sub-operation is performed on a portion of a second data segment using a second thread in parallel, thus simultaneously using intra-node and inter-node link structures to reduce idle time of the intra-node and inter-node link structures. 在分布式训练中,为了节省通信开销及减少训练时间,第一计算节点可以将分配给处理单元的数据块划分成多个数据段,该多个数据段至少包括第一数据段和第二数据段;将该多个数据段分配给多个线程,该多个线程至少包括第一线程和第二线程;利用第一线程对第一数据段的一部分执行节点内子运算,且并行地利用第二线程对第二数据段的一部分执行节点间子运算,因此同时利用节点内和节点间链接结构来减少节点内和节