Fast neural network training on a cluster of GPUs for action recognition with high accuracy

We propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of parallel and distributed computing 2019-12, Vol.134 (na), p.153-165
Hauptverfasser:	Cong, Guojing, Domeniconi, Giacomo, Yang, Chih-Chieh, Shapiro, Joshua, Zhou, Fan, Chen, Barry
Format:	Artikel
Sprache:	eng
Schlagworte:	Distributed training GPU Machine learning MATHEMATICS AND COMPUTING Transfer learning Video analytics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. For the UCF101 and HMDB51 datasets, the validation accuracies achieved are 93.1% and 67.9% respectively. With an additional end-to-end trained temporal stream, the validation accuracies achieved for UCF101 and HMDB51 are 93.47% and 81.24% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach using ResNet that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets. •We adopt a communication efficient, adaptive batch size K-step averaging algorithm that achieves very good parallel speedups.•We employ transfer-learning to further reduce training time while improving validation accuracy.•With an additional end-to-end trained temporal stream, the validation accuracies achieved for UCF101 and HMDB51 are 93.47% and 81.24% respectively.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2019.07.009