Method for increasing deduplication speed on data streams fragmented by shuffling
A computer-implemented method for deduplicating an incoming data sequence can include the steps of storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index, sequentially storing signature values for at least some of the plurality of data blocklets...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A computer-implemented method for deduplicating an incoming data sequence can include the steps of storing signature values for a plurality of data blocklets of a parent data sequence in a deduplication index, sequentially storing signature values for at least some of the plurality of data blocklets of the parent data sequence in a first storage location outside of the deduplication index, determining that a first data blocklet in the incoming data sequence is absent from the parent data sequence, storing a signature value for the first data blocklet in a second storage location outside of the deduplication index, storing a guarded link linking the first data blocklet to the second data blocklet into the second storage location, determining that a second data blocklet that follows the first data blocklet in the incoming data sequence is present in the parent data sequence, the second data blocklet having a signature value that is stored in the first storage location, and copying at least a portion of the contents of the first storage location and the second storage location into a cache to expedite access during deduplication of the incoming data sequence. |
---|