A Novel Pipeline Approach for Efficient Big Data Broadcasting

Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2016-01, Vol.28 (1), p.17-28
Hauptverfasser: Wu, Chi-Jen, Ku, Chin-Fu, Ho, Jan-Ming, Chen, Ming-Syan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed datacenters. This problem is one of the fundamental problems in distributed computing and is known to be NP-hard in heterogeneous environments. We model the Big-data broadcasting problem into a LockStep Broadcast Tree (LSBT) problem. The main idea of the LSBT model is to define a basic unit of upload bandwidth, r, such that a node with capacity c broadcasts data to a set of [c/r] children at the rater. Note that r is a parameter to be optimized as part of the LSBT problem. We further divide the broadcast data into m chunks. These data chunks can then be broadcast down the LSBT in a pipeline manner. In a homogeneous network environment in which each node has the same upload capacity c, we show that the optimal uplink rate r* of LSBT is either c/2 or c/3, whichever gives the smaller maximum completion time. For heterogeneous environments, we present an O(nlog 2 n) algorithm to select an optimal uplink rater* and to construct an optimal LSBT. Numerical results show that our approach performs well with less maximum completion time and lower computational complexity than other efficient solutions in literature.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2015.2468714