Simulation of Quantum Many-Body Dynamics with Tensor Processing Units: Floquet Prethermalization
Tensor processing units (TPUs) are specialized hardware accelerators developed by Google to support large-scale machine-learning tasks but they can also be leveraged to accelerate and scale other linear-algebra-intensive computations. In this paper, we demonstrate the usage of TPUs for massively par...
Gespeichert in:
Veröffentlicht in: | PRX quantum 2022-05, Vol.3 (2), p.020331, Article 020331 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tensor processing units (TPUs) are specialized hardware accelerators developed by Google to support large-scale machine-learning tasks but they can also be leveraged to accelerate and scale other linear-algebra-intensive computations. In this paper, we demonstrate the usage of TPUs for massively parallel classical simulations of quantum many-body dynamics on long time scales. We apply our methods to study the phenomenon of Floquet prethermalization, i.e., exponentially slow heating in quantum spin chains subject to high-frequency periodic driving. We simulate the dynamics of L=34 qubits for over 10^{5} Floquet periods, corresponding to circuits with 4×10^{6} nearest-neighbor two-qubit gates. The circuits simulated have no additional symmetries and represent a pure-state evolution in the full 2^{L}-dimensional Hilbert space. This is achieved by distributing the computation over 128 TPU cores. On that size TPU cluster, we find speed-ups in wall-clock run time of 230 times and 15 times when compared to reference CPU and single-graphics-processing-unit (GPU) simulations, respectively, for shorter-time 30-qubit simulations that can be handled by all three platforms. We study the computational cost of the simulations, as a function of both the number of qubits and the number of TPU cores used, up to our maximum capacity of L=40 qubits, which requires a “full pod” of 2048 TPU cores with tens of terabytes of memory in total. For these simulations, an eight-TPU-core machine is comparable to a single A100 GPU and thus the full TPU pod is comparable to a machine with hundreds of top-of-the-line GPUs. However, the TPU pod is more energy and cost efficient and readily accessible (via Google Cloud), unlike such large many-GPU configurations. We also study the accumulation of numerical error as a function of circuit depth in very deep circuits. Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics. |
---|---|
ISSN: | 2691-3399 2691-3399 |
DOI: | 10.1103/PRXQuantum.3.020331 |