Efficient and Portable Distribution Modeling for Large-Scale Scientific Data Processing with Data-Parallel Primitives

The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Algorithms 2021-10, Vol.14 (10), p.285
Hauptverfasser:	Yang, Hao-Yi, Lin, Zhi-Rong, Wang, Ko-Chih
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data analysis Data processing data-parallel primitive Datasets distribution-based approach Hard disks Histograms large-scale data processing Modelling parallel algorithm Probabilistic models Representations scientific dataset Simulation Supercomputers Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel algorithms focus on modeling single distributions from many input samples efficiently, but these may not fit the large-scale scientific data processing scenario because they cannot utilize computing resources effectively. Histograms and the Gaussian Mixture Model (GMM) are the most popular distribution representations used to model scientific datasets. Therefore, we propose the use of multi-set histogram and GMM modeling algorithms for the scenario of large-scale scientific data processing. Our algorithms are developed by data-parallel primitives to achieve portability across different hardware architectures. We evaluate the performance of the proposed algorithms in detail and demonstrate use cases for scientific data processing.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a14100285