A survey on bandwidth-aware geo-distributed frameworks for big-data analytics

In the era of global-scale services, organisations produce huge volumes of data, often distributed across multiple data centres, separated by vast geographical distances. While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Big Data 2021-02, Vol.8 (1), p.1-26, Article 40
Hauptverfasser: Bergui, Mohammed, Najah, Said, Nikolov, Nikola S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the era of global-scale services, organisations produce huge volumes of data, often distributed across multiple data centres, separated by vast geographical distances. While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running jobs across geo-distributed data centres. The necessity to utilise such infrastructure introduces new challenges in the data analytics process due to bandwidth limitations of the inter-data-centre communication. In this article, we discuss challenges and survey the latest geo-distributed big-data analytics frameworks and schedulers (based on MapReduce and Spark) with WAN-bandwidth awareness.
ISSN:2196-1115
2196-1115
DOI:10.1186/s40537-021-00427-9