Exploring data parallelism and locality in wide area networks

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming structure. Work to date, for example MapReduce and Hadoop, has focused on systems within a data center. In this paper, we present Sphere, a cloud computing syst...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yunhong Gu, Grossman, R.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming structure. Work to date, for example MapReduce and Hadoop, has focused on systems within a data center. In this paper, we present Sphere, a cloud computing system that targets distributed data-intensive applications over wide area networks. Sphere uses a data-parallel computing model that views the processing of distributed datasets as applying a group of operators to each element in the datasets. As a cloud computing system, application developers can use the Sphere API to write very simple code to process distributed datasets in parallel, while the details, including but not limited to, data locations, server heterogeneity, load balancing, and fault tolerance, are transparent to developers. Unlike MapReduce or Hadoop, Sphere supports distributed data processing on a global scale by exploiting data parallelism and locality in systems over wide area networks.
ISSN:2151-1683
DOI:10.1109/MTAGS.2008.4777906