The Microarchitecture of DOJO, Tesla's Exa-Scale Computer

The Tesla-built DOJO system is a scalable solution targeted towards machine learning training applications. It is based on the D1 custom compute chip which packs together 354 independent processors, resulting in 362 TFLOPS of compute and 440 MB of internal static random-access memory storage. While...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE MICRO 2023-05, Vol.43 (3), p.31-39
Hauptverfasser: Talpes, Emil, Sarma, Debjit Das, Williams, Doug, Arora, Sahil, Kunjan, Thomas, Floering, Benjamin, Jalote, Ankit, Hsiong, Christopher, Poorna, Chandrasekhar, Samant, Vaidehi, Sicilia, John, Nivarti, Anantha Kumar, Ramachandran, Raghuvir, Fischer, Tim, Herzberg, Ben, McGee, Bill, Venkataramanan, Ganesh, Banon, Pete
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Tesla-built DOJO system is a scalable solution targeted towards machine learning training applications. It is based on the D1 custom compute chip which packs together 354 independent processors, resulting in 362 TFLOPS of compute and 440 MB of internal static random-access memory storage. While maintaining full programmability, DOJO emphasizes distribution of resources and an extremely high bandwidth interconnect, allowing it to scale from small systems all the way to exaFLOP supercomputers.
ISSN:0272-1732
1937-4143
DOI:10.1109/MM.2023.3258906