Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge

In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection effi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2016-03, Vol.27 (3), p.855-868
Hauptverfasser:	Fu, Min, Feng, Dan, Hua, Yu, He, Xubin, Chen, Zuoning, Liu, Jingning, Xia, Wen, Huang, Fangting, Liu, Qing
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Back up systems Backups chunk fragmentation Complement Containers Data deduplication Distributed databases Fragmentation Garbage collection Historic Image restoration Indexes Merging Metadata Out of order performance evaluation Prefetching storage system
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation, we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36 \times at a cost of only rewriting 0.5-2.03 percent data.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2015.2410781