Parallel method based on hybrid architecture in distributed training

In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segm...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	WANG SHAOCHUANG, YE JIANXI, DONG JIANBO, RAN QIANYUAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In distributed training, in order to save communication overhead and reduce training time, a first computing node can divide a data block allocated to a processing unit into a plurality of data segments, and the plurality of data segments at least comprise a first data segment and a second data segment; allocating the plurality of data segments to a plurality of threads, wherein the plurality of threads at least comprise a first thread and a second thread; in one embodiment, an intra-node sub-operation is performed on a portion of a first data segment using a first thread, and an inter-node sub-operation is performed on a portion of a second data segment using a second thread in parallel, thus simultaneously using intra-node and inter-node link structures to reduce idle time of the intra-node and inter-node link structures. 在分布式训练中，为了节省通信开销及减少训练时间，第一计算节点可以将分配给处理单元的数据块划分成多个数据段，该多个数据段至少包括第一数据段和第二数据段；将该多个数据段分配给多个线程，该多个线程至少包括第一线程和第二线程；利用第一线程对第一数据段的一部分执行节点内子运算，且并行地利用第二线程对第二数据段的一部分执行节点间子运算，因此同时利用节点内和节点间链接结构来减少节点内和节