Redundancy elimination by aggregation of multiple chunks

A data redundancy elimination system. In particular implementations, a method includes storing in a memory one or more aggregation trees, each aggregation tree comprising one or more base chunk nodes and one or more super chunk nodes, wherein each base chunk node comprises a chunk signature and corr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Glass, Gideon, Martynov, Maxim, Zhang, Qiwen, Lev Ran, Etai, Li, Dan
Format:	Patent
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A data redundancy elimination system. In particular implementations, a method includes storing in a memory one or more aggregation trees, each aggregation tree comprising one or more base chunk nodes and one or more super chunk nodes, wherein each base chunk node comprises a chunk signature and corresponding raw data, and wherein super chunk nodes correspond to child base chunk nodes and include a chunk signature; receiving a data block; dividing the data block into a plurality of base chunks, each base chunk having a degree value characterizing the occurrence probability of the base chunk; computing chunk signatures for the plurality of base chunks; applying a super chunk rule to contiguous sequences of base chunks of the plurality of base chunks to create one or more aggregation trees, wherein the super chunk rule aggregates base chunks based on the respective occurrence probabilities of the base chunks; identifying one or more nodes in the one or more created aggregation trees that match corresponding nodes of the aggregation trees in the memory; compressing the received data block based on the identified nodes; and conditionally adding the one or more created aggregation trees to the memory.