The O2 software framework and GPU usage in ALICE online and offline reconstruction in Run 3

ALICE has upgraded many of its detectors for LHC Run 3 to operate in continuous readout mode recording Pb--Pb collisions at 50 kHz interaction rate without trigger. This results in the need to process data in real time at rates 100 times higher than during Run 2. In order to tackle such a challenge...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Eulisse, Giulio, Rohr, David
Format:	Artikel
Sprache:	eng
Schlagworte:	Physics - Instrumentation and Detectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	ALICE has upgraded many of its detectors for LHC Run 3 to operate in continuous readout mode recording Pb--Pb collisions at 50 kHz interaction rate without trigger. This results in the need to process data in real time at rates 100 times higher than during Run 2. In order to tackle such a challenge we introduced O2, a new computing system and the associated infrastructure. Designed and implemented during the LHC long shutdown 2, O2 is now in production taking care of all the data processing needs of the experiment. O2 is designed around the message passing paradigm, enabling resilient, parallel data processing for both the synchronous (to LHC beam) and asynchronous data taking and processing phases. The main purpose of the synchronous online reconstruction is detector calibration and raw data compression. This synchronous processing is dominated by the TPC detector, which produces by far the largest data volume, and TPC reconstruction runs fully on GPUs. When there is no beam in the LHC, the powerful GPU-equipped online computing farm of ALICE is used for the asynchronous reconstruction, which creates the final reconstructed output for analysis from the compressed raw data. Since the majority of the compute performance of the online farm is in the GPUs, and since the asynchronous processing is not dominated by the TPC in the way the synchronous processing is, there is an ongoing effort to offload a significant amount of compute load from other detectors to the GPU as well.
DOI:	10.48550/arxiv.2402.01205