Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bhagwat, D., Eshghi, K., Long, D.D.E., Lillibridge, M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 9
container_issue
container_start_page 1
container_title
container_volume
creator Bhagwat, D.
Eshghi, K.
Long, D.D.E.
Lillibridge, M.
description Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.
doi_str_mv 10.1109/MASCOT.2009.5366623
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5366623</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5366623</ieee_id><sourcerecordid>5366623</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1853-3383bff7016ab110a16a700b5886ed9760bef9307fd204463c25cbca599d094b3</originalsourceid><addsrcrecordid>eNo1kMtOAjEYRustcUCegE0fwBn_3lt3SPCSQFiAa9J2Wq2UYTIDib69JOLqLL7kJN9BaEygIgTMw2Kymi7XFQUwlWBSSsou0MgoTTjlnBuq5SUqKFOiBErVFRr8D0pfo4IIKkslmLlFg77_AqBABCvQYvZ96MIu4KfUNKn5eMQrb7N1Odzj1nY255BxHepjm5O3h7RvcNx32H8em23pbB9qHFMO2Fm_PbZ36Cba3IfRmUP0_jxbT1_L-fLlbTqZl55owUrGNHMxKiDSutM7e6ICcEJrGWqjJLgQDQMVawqcS-ap8M5bYUwNhjs2ROM_bwohbNou7Wz3szlnYb9RIVHS</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Extreme Binning: Scalable, parallel deduplication for chunk-based file backup</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Bhagwat, D. ; Eshghi, K. ; Long, D.D.E. ; Lillibridge, M.</creator><creatorcontrib>Bhagwat, D. ; Eshghi, K. ; Long, D.D.E. ; Lillibridge, M.</creatorcontrib><description>Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.</description><identifier>ISSN: 1526-7539</identifier><identifier>ISBN: 1424449278</identifier><identifier>ISBN: 9781424449279</identifier><identifier>EISSN: 2375-0227</identifier><identifier>EISBN: 9781424449286</identifier><identifier>EISBN: 1424449286</identifier><identifier>DOI: 10.1109/MASCOT.2009.5366623</identifier><language>eng</language><publisher>IEEE</publisher><subject>Digital images ; Electronic mail ; Intrusion detection ; Laboratories ; Milling machines ; Robustness ; Routing ; Space technology ; Throughput ; Web pages</subject><ispartof>2009 IEEE International Symposium on Modeling, Analysis &amp; Simulation of Computer and Telecommunication Systems, 2009, p.1-9</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1853-3383bff7016ab110a16a700b5886ed9760bef9307fd204463c25cbca599d094b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5366623$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5366623$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Bhagwat, D.</creatorcontrib><creatorcontrib>Eshghi, K.</creatorcontrib><creatorcontrib>Long, D.D.E.</creatorcontrib><creatorcontrib>Lillibridge, M.</creatorcontrib><title>Extreme Binning: Scalable, parallel deduplication for chunk-based file backup</title><title>2009 IEEE International Symposium on Modeling, Analysis &amp; Simulation of Computer and Telecommunication Systems</title><addtitle>MASCOT</addtitle><description>Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.</description><subject>Digital images</subject><subject>Electronic mail</subject><subject>Intrusion detection</subject><subject>Laboratories</subject><subject>Milling machines</subject><subject>Robustness</subject><subject>Routing</subject><subject>Space technology</subject><subject>Throughput</subject><subject>Web pages</subject><issn>1526-7539</issn><issn>2375-0227</issn><isbn>1424449278</isbn><isbn>9781424449279</isbn><isbn>9781424449286</isbn><isbn>1424449286</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kMtOAjEYRustcUCegE0fwBn_3lt3SPCSQFiAa9J2Wq2UYTIDib69JOLqLL7kJN9BaEygIgTMw2Kymi7XFQUwlWBSSsou0MgoTTjlnBuq5SUqKFOiBErVFRr8D0pfo4IIKkslmLlFg77_AqBABCvQYvZ96MIu4KfUNKn5eMQrb7N1Odzj1nY255BxHepjm5O3h7RvcNx32H8em23pbB9qHFMO2Fm_PbZ36Cba3IfRmUP0_jxbT1_L-fLlbTqZl55owUrGNHMxKiDSutM7e6ICcEJrGWqjJLgQDQMVawqcS-ap8M5bYUwNhjs2ROM_bwohbNou7Wz3szlnYb9RIVHS</recordid><startdate>200909</startdate><enddate>200909</enddate><creator>Bhagwat, D.</creator><creator>Eshghi, K.</creator><creator>Long, D.D.E.</creator><creator>Lillibridge, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200909</creationdate><title>Extreme Binning: Scalable, parallel deduplication for chunk-based file backup</title><author>Bhagwat, D. ; Eshghi, K. ; Long, D.D.E. ; Lillibridge, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1853-3383bff7016ab110a16a700b5886ed9760bef9307fd204463c25cbca599d094b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Digital images</topic><topic>Electronic mail</topic><topic>Intrusion detection</topic><topic>Laboratories</topic><topic>Milling machines</topic><topic>Robustness</topic><topic>Routing</topic><topic>Space technology</topic><topic>Throughput</topic><topic>Web pages</topic><toplevel>online_resources</toplevel><creatorcontrib>Bhagwat, D.</creatorcontrib><creatorcontrib>Eshghi, K.</creatorcontrib><creatorcontrib>Long, D.D.E.</creatorcontrib><creatorcontrib>Lillibridge, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bhagwat, D.</au><au>Eshghi, K.</au><au>Long, D.D.E.</au><au>Lillibridge, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Extreme Binning: Scalable, parallel deduplication for chunk-based file backup</atitle><btitle>2009 IEEE International Symposium on Modeling, Analysis &amp; Simulation of Computer and Telecommunication Systems</btitle><stitle>MASCOT</stitle><date>2009-09</date><risdate>2009</risdate><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>1526-7539</issn><eissn>2375-0227</eissn><isbn>1424449278</isbn><isbn>9781424449279</isbn><eisbn>9781424449286</eisbn><eisbn>1424449286</eisbn><abstract>Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.</abstract><pub>IEEE</pub><doi>10.1109/MASCOT.2009.5366623</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1526-7539
ispartof 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009, p.1-9
issn 1526-7539
2375-0227
language eng
recordid cdi_ieee_primary_5366623
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Digital images
Electronic mail
Intrusion detection
Laboratories
Milling machines
Robustness
Routing
Space technology
Throughput
Web pages
title Extreme Binning: Scalable, parallel deduplication for chunk-based file backup
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A42%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Extreme%20Binning:%20Scalable,%20parallel%20deduplication%20for%20chunk-based%20file%20backup&rft.btitle=2009%20IEEE%20International%20Symposium%20on%20Modeling,%20Analysis%20&%20Simulation%20of%20Computer%20and%20Telecommunication%20Systems&rft.au=Bhagwat,%20D.&rft.date=2009-09&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=1526-7539&rft.eissn=2375-0227&rft.isbn=1424449278&rft.isbn_list=9781424449279&rft_id=info:doi/10.1109/MASCOT.2009.5366623&rft_dat=%3Cieee_6IE%3E5366623%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424449286&rft.eisbn_list=1424449286&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5366623&rfr_iscdi=true