NetSync: A Network Adaptive and Deduplication-Inspired Delta Synchronization Approach for Cloud Storage Services
Delta sync (synchronization) is a key bandwidth-saving technique for cloud storage services. The representative delta sync utility, rsync , matches data chunks by sliding a search window byte-by-byte to maximize the redundancy detection for bandwidth efficiency. However, it is difficult for this pro...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2022, Vol.33 (10), p.2554-2570 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Delta sync (synchronization) is a key bandwidth-saving technique for cloud storage services. The representative delta sync utility, rsync , matches data chunks by sliding a search window byte-by-byte to maximize the redundancy detection for bandwidth efficiency. However, it is difficult for this process to cater to the forthcoming high-bandwidth cloud storage services which require lightweight delta sync that can well support large files. Moreover, rsync employs invariant chunking and compression methods during the sync process, making it unable to cater to services from various network environments which require the sync approach to perform well under different network conditions. Inspired by the Content-Defined Chunking (CDC) technique used in data deduplication, we propose NetSync, a network adaptive and CDC-based lightweight delta sync approach with less computing and protocol (metadata) overheads than the state-of-the-art delta sync approaches. Besides, NetSync can choose appropriate compressing and chunking strategies for different network conditions. The key idea of NetSync is (1) to simplify the process of chunk matching by proposing a fast weak hash called FastFP that is piggybacked on the rolling hashes from CDC, and redesigning the delta sync protocol by exploiting deduplication locality and weak/strong hash properties; (2) to minimize the sync time by adaptively choosing chunking parameters and compression methods according to the current network conditions. Our evaluation results driven by both benchmark and real-world datasets suggest NetSync performs 2\times 2× -10\times 10× faster and supports 30\% 30% -80\% 80% more clients than the state-of-the-art rsync -base |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2022.3145025 |