A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

BackPropagation (BP) is the most famous learning algorithm for Artificial Neural Networks (ANN). BP has received intensive research efforts to exploit its parallelism in order to reduce the training time for complex problems. A modified version of BP based on matrix–matrix multiplication was propose...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of parallel and distributed computing 2008-07, Vol.68 (7), p.949-961
Hauptverfasser:	Soliman, Mostafa I., Mohamed, Samir A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Backpropagation algorithm Computer science control theory systems Computer systems and distributed systems. User interface Connectionism. Neural networks Exact sciences and technology Loop unrolling Neural networks Parallel algorithms Parallel architecture Reusing data Software Speech and sound recognition and synthesis. Linguistics Vector/matrix processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	BackPropagation (BP) is the most famous learning algorithm for Artificial Neural Networks (ANN). BP has received intensive research efforts to exploit its parallelism in order to reduce the training time for complex problems. A modified version of BP based on matrix–matrix multiplication was proposed for parallel processing. In this paper, we present the implementation of Matrix BackPropagation (MBP) using scalar, vector, and matrix Instruction Set Architectures (ISAs). Besides this, we show that the performance of the MBP is improved by switching from scalar ISA to vector ISA. It is further improved by switching from vector ISA to matrix ISA. On a practical application, speech recognition, the speedup of training a neural network using unrolling scalar ISA over scalar ISA is 1.83. On eight parallel lanes, the speedups of using vector, unrolling vector, and matrix ISAs are respectively 10.33, 11.88, and 15.36, where the maximum theoretical speedup is 16. The results obtained show that the use of matrix ISA gives a performance close to optimal, because of reusing the loaded data, decreasing the loop overhead, and overlapping the memory operations with arithmetic operations.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2007.12.004