Cloud based industrial file handling and duplication removal using source based deduplication technique
Data Deduplication is defined as an elimination process of redundant duplicated data; this process is stratified by using a unique value that been represented by a chunk of data, which is referenced by the original file that contains this chunk. Data Deduplication techniques have been mainly applied...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data Deduplication is defined as an elimination process of redundant duplicated data; this process is stratified by using a unique value that been represented by a chunk of data, which is referenced by the original file that contains this chunk. Data Deduplication techniques have been mainly applied in the cloud-based systems in order to decrease the space size of the data storage, and enhance the connection bandwidth. In this paper, we have introduced a data deduplication optimization technique that been applied to the data storage of cloud-based systems. The proposed technique optimizes data deduplication by implementing Source-Based and In-Line based techniques. The Source-Based method is stratified at the source that contains the data, on the other hand, the In-Line method is stratified at the RAM that contains the data momentarily before the writing process of the I/O. Moreover, the proposed technique applies a Content-Based chunking algorithm with Variable Chunking utilization by using Rabin Karp Rolling Hash (RKRH). RKRH is a data chunking algorithm that breaks data files into different segments sizes. Generally, the proposed technique process is based on calculating the hash value of data chunk which considered as a fingerprint. Afterward, the chunk availability process is applied in order to identify the existence of this chunk in the storage; therefore, if this chunk does not exist in the storage a reference to this chunk is added and store the hash value as a key in the storage. The proposed technique includes data chunk compression to reduce the data redundancy in the same chunk. Practically, the proposed technique provides a data deduplication ratio of 33 percent and an average upload latency of five seconds. Finally, the proposed approach utilized with any data file type as a byte stream. |
---|---|
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/5.0030989 |