SYSTEM AND METHOD FOR ESTIMATION OF ERROR BOUNDS FOR FILE SIZE CALCULATIONS USING MINHASH IN DEDUPLICATION SYSTEMS
A system and method for an estimation of error bounds for file size calculations using MinHash in deduplication systems. The system includes one or more processors to determine a similarity score between the first file and the second file. The one or more processors are further to determine a size e...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A system and method for an estimation of error bounds for file size calculations using MinHash in deduplication systems. The system includes one or more processors to determine a similarity score between the first file and the second file. The one or more processors are further to determine a size estimation of a combination of the first and second files based on the similarity score. Finally, the one or more processors are to determine a maximum error for the size estimation of the combination of the first and second files, wherein the first and second file are to be combined via deduplication and have at least one shared data segment. |
---|