GPU-accelerated MART and concurrent cross-correlation for tomographic PIV

This paper presents a novel Graphics Processing Unit (GPU)-accelerated method for large-scale data processing of tomographic particle image velocimetry. The multiplicative algebraic reconstruction technique (MART) is utilized to reconstruct three-dimensional (3D) particle fields, and cross-correlati...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Experiments in fluids 2022-05, Vol.63 (5), Article 91
Hauptverfasser:	Zeng, Xin, He, Chuangxin, Liu, Yingzheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Batch processing Central processing units Computer architecture Concurrency CPUs Cross correlation Data processing Efficiency Engineering Engineering Fluid Dynamics Engineering Thermodynamics Fast Fourier transformations Fluid- and Aerodynamics Fourier transforms Graphics processing units Heat and Mass Transfer Image reconstruction Jet flow Kernel functions Mathematical analysis Particle image velocimetry Research Article Vectors (mathematics) Velocity Velocity distribution
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a novel Graphics Processing Unit (GPU)-accelerated method for large-scale data processing of tomographic particle image velocimetry. The multiplicative algebraic reconstruction technique (MART) is utilized to reconstruct three-dimensional (3D) particle fields, and cross-correlation with fast Fourier transform is used to generate the displacement vectors. The Compute Unified Device Architecture (CUDA) C programming model is used to port the velocity field reconstruction from CPU code to GPU code to improve efficiency. For similar reconstruction tasks, a particular thread grid hierarchy is designed to construct the corresponding computational kernel functions, and each task is launched in a single thread. A modified strategy of pixel batch processing is then used to manage the GPU memory access. Subsequently, the asynchronous stream concurrency is used to generate the velocity field with the GPU cuFFT library. A synthetic 3D experiment with a ring vortex is carried out to verify the accuracy and efficiency of the developed method. The parallel results agree well with the generated data and other research conclusions reported in the literature. The speed-up ratio by multi-core CPU (Intel® Xeon® Platinum 8168) parallel implementation with OpenMP converges to 2.5 × in MFG-MART and 3.0 × in cross-correlation. In contrast to a 24-core CPU implementation, a GPU (NVIDIA Tesla V100S, 32 GB) under maximum memory usage achieves an impressive speed-up ratio of over 20 × in parallel MFG-MART and 4 × in concurrent cross-correlation. The measurement of turbulent flow in a circular jet flow at Reynolds 3,000 is used to examine the efficiency promotion of the parallelized framework in real experimental settings. For the synthetic volume reconstruction of 700 × 700 × 140 voxels and cross-correlation with 41 3 voxels window in a 75% overlap, and the experimental volume reconstruction of 550 × 1100 × 550 voxels and cross-correlation with 32 3 voxels window in a 50% overlap, a frame of velocity field can be completed within 2 min in each domain. Graphical abstract
ISSN:	0723-4864 1432-1114
DOI:	10.1007/s00348-022-03444-3