Cluster-to-cluster data transfer with data compression over wide-area networks

The recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of parallel and distributed computing 2015-05, Vol.79-80 (C), p.90-103
Hauptverfasser: Jung, Eun-Sung, Kettimuthu, Rajkumar, Vishwanath, Venkatram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance. These components include disk I/O subsystem, additional computation associated with data streams, and network adapters. For example, achievable bandwidth by TCP may not be implementable if disk I/O or CPU becomes a bottleneck in end-to-end data transfer. In this paper, we first model all the system components involved in end-to-end data transfer as a graph. We then formulate the problem whose goal is to achieve maximum data transfer throughput using parallel data flows. We also propose a variable data flow GridFTP XIO stack to improve data transfer with data compression. Our contributions lie in how to optimize data transfers considering all the system components involved rather than in accurately modeling all the system components involved. Our proposed formulations and solutions are evaluated through experiments on the ESnet 100G testbed and a wide-area cluster-to-cluster testbed. The experimental results on the ESnet 100G testbed show that our approach is several times faster than Globus Online—8 × faster for datasets with many 10 MB files and 3–4 × faster for other datasets of larger size files. The experimental results on the cluster-to-cluster testbed show that our variable data flow approach is up to 4 × faster than a normal cluster data transfer. •We model all the system components involved in end-to-end data transfer as a graph.•We formulate the problem to optimize data transfer throughput using parallel flows.•We propose a novel software I/O stack with variable parallel data flows per layer.•Optimized parallel data flows improve overall data transfer throughput.•The novel I/O stack with data compression alleviates network bottleneck situations.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.09.008