Impact of temperature on hard disk drive reliability in large datacenters

When datacenters are pushed to their limits of operational efficiency, reducing failure rates becomes critical for maintaining high levels of healthy server operation. In this experience report, we present a dense storage case study from a large population of servers housing tens of thousands of dis...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sankar, S., Shaw, M., Vaid, K.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 537
container_issue
container_start_page 530
container_title
container_volume
creator Sankar, S.
Shaw, M.
Vaid, K.
description When datacenters are pushed to their limits of operational efficiency, reducing failure rates becomes critical for maintaining high levels of healthy server operation. In this experience report, we present a dense storage case study from a large population of servers housing tens of thousands of disk drives. Previous studies have presented divergent results concerning correlation between temperature and hard disk drive failures. In our paper, we specifically establish correlation between temperatures and failures observed at different location granularities: a) inside drive locations in a server chassis, b) across server locations in a rack and c) across multiple racks in a datacenter. We also establish that temperature exhibits a stronger correlation to failures compared to the correlation of disk utilization with drive failures. Thus, we show that temperature-aware server and datacenter design plays a pivotal role in datacenter reliability. Following our case study, we present a reliability model for estimating hard disk drive failures correlated with the datacenter operating temperature. We use a physical Arrhenius model with empirically derived coefficients for our model. We show an application of the model for selecting the datacenter inlet temperature setpoint for two different server storage configurations. Finally, with the help of a datacenter cost discussion, we highlight the need to incorporate reliability-aware datacenter design for increased efficiency in large scale datacenters.
doi_str_mv 10.1109/DSN.2011.5958265
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5958265</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5958265</ieee_id><sourcerecordid>5958265</sourcerecordid><originalsourceid>FETCH-LOGICAL-c222t-c60326901b22eae94d7a5646c95ea83dfef69037b197201eb2d4b992a7bfdfcb3</originalsourceid><addsrcrecordid>eNpVkMtOwzAURM1LoirZI7HxD6TY17FjL1F5VapgAayr6_gGDEkbOQapf08kumE2szjSjGYYu5RiIaVw17cvTwsQUi600xaMPmKFq62soKocKGWO2QyktqVyUJ_8Y-BO2UxqJUphrTtnxTh-iknGOGHsjK1W_YBN5ruWZ-oHSpi_E_Hdln9gCjzE8YuHFH-IJ-oi-tjFvOdxyztM78QDZmxomymNF-ysxW6k4uBz9nZ_97p8LNfPD6vlzbpsACCXjREKpnLpAQjJVaFGbSrTOE1oVWipnaiqvXT1NJk8hMo7B1j7NrSNV3N29ZcbiWgzpNhj2m8Ox6hfx9pSSg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Impact of temperature on hard disk drive reliability in large datacenters</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Sankar, S. ; Shaw, M. ; Vaid, K.</creator><creatorcontrib>Sankar, S. ; Shaw, M. ; Vaid, K.</creatorcontrib><description>When datacenters are pushed to their limits of operational efficiency, reducing failure rates becomes critical for maintaining high levels of healthy server operation. In this experience report, we present a dense storage case study from a large population of servers housing tens of thousands of disk drives. Previous studies have presented divergent results concerning correlation between temperature and hard disk drive failures. In our paper, we specifically establish correlation between temperatures and failures observed at different location granularities: a) inside drive locations in a server chassis, b) across server locations in a rack and c) across multiple racks in a datacenter. We also establish that temperature exhibits a stronger correlation to failures compared to the correlation of disk utilization with drive failures. Thus, we show that temperature-aware server and datacenter design plays a pivotal role in datacenter reliability. Following our case study, we present a reliability model for estimating hard disk drive failures correlated with the datacenter operating temperature. We use a physical Arrhenius model with empirically derived coefficients for our model. We show an application of the model for selecting the datacenter inlet temperature setpoint for two different server storage configurations. Finally, with the help of a datacenter cost discussion, we highlight the need to incorporate reliability-aware datacenter design for increased efficiency in large scale datacenters.</description><identifier>ISSN: 1530-0889</identifier><identifier>ISBN: 9781424492329</identifier><identifier>ISBN: 1424492327</identifier><identifier>EISSN: 2158-3927</identifier><identifier>EISBN: 9781424492336</identifier><identifier>EISBN: 1424492319</identifier><identifier>EISBN: 9781424492312</identifier><identifier>EISBN: 1424492335</identifier><identifier>DOI: 10.1109/DSN.2011.5958265</identifier><language>eng</language><publisher>IEEE</publisher><subject>Correlation ; Datacenter ; Disk drives ; Drives ; Hard Disk Drive ; Hard disks ; Reliability ; Servers ; Temperature ; Temperature measurement</subject><ispartof>2011 IEEE/IFIP 41st International Conference on Dependable Systems &amp; Networks (DSN), 2011, p.530-537</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c222t-c60326901b22eae94d7a5646c95ea83dfef69037b197201eb2d4b992a7bfdfcb3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5958265$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5958265$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sankar, S.</creatorcontrib><creatorcontrib>Shaw, M.</creatorcontrib><creatorcontrib>Vaid, K.</creatorcontrib><title>Impact of temperature on hard disk drive reliability in large datacenters</title><title>2011 IEEE/IFIP 41st International Conference on Dependable Systems &amp; Networks (DSN)</title><addtitle>DSN</addtitle><description>When datacenters are pushed to their limits of operational efficiency, reducing failure rates becomes critical for maintaining high levels of healthy server operation. In this experience report, we present a dense storage case study from a large population of servers housing tens of thousands of disk drives. Previous studies have presented divergent results concerning correlation between temperature and hard disk drive failures. In our paper, we specifically establish correlation between temperatures and failures observed at different location granularities: a) inside drive locations in a server chassis, b) across server locations in a rack and c) across multiple racks in a datacenter. We also establish that temperature exhibits a stronger correlation to failures compared to the correlation of disk utilization with drive failures. Thus, we show that temperature-aware server and datacenter design plays a pivotal role in datacenter reliability. Following our case study, we present a reliability model for estimating hard disk drive failures correlated with the datacenter operating temperature. We use a physical Arrhenius model with empirically derived coefficients for our model. We show an application of the model for selecting the datacenter inlet temperature setpoint for two different server storage configurations. Finally, with the help of a datacenter cost discussion, we highlight the need to incorporate reliability-aware datacenter design for increased efficiency in large scale datacenters.</description><subject>Correlation</subject><subject>Datacenter</subject><subject>Disk drives</subject><subject>Drives</subject><subject>Hard Disk Drive</subject><subject>Hard disks</subject><subject>Reliability</subject><subject>Servers</subject><subject>Temperature</subject><subject>Temperature measurement</subject><issn>1530-0889</issn><issn>2158-3927</issn><isbn>9781424492329</isbn><isbn>1424492327</isbn><isbn>9781424492336</isbn><isbn>1424492319</isbn><isbn>9781424492312</isbn><isbn>1424492335</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkMtOwzAURM1LoirZI7HxD6TY17FjL1F5VapgAayr6_gGDEkbOQapf08kumE2szjSjGYYu5RiIaVw17cvTwsQUi600xaMPmKFq62soKocKGWO2QyktqVyUJ_8Y-BO2UxqJUphrTtnxTh-iknGOGHsjK1W_YBN5ruWZ-oHSpi_E_Hdln9gCjzE8YuHFH-IJ-oi-tjFvOdxyztM78QDZmxomymNF-ysxW6k4uBz9nZ_97p8LNfPD6vlzbpsACCXjREKpnLpAQjJVaFGbSrTOE1oVWipnaiqvXT1NJk8hMo7B1j7NrSNV3N29ZcbiWgzpNhj2m8Ox6hfx9pSSg</recordid><startdate>201106</startdate><enddate>201106</enddate><creator>Sankar, S.</creator><creator>Shaw, M.</creator><creator>Vaid, K.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201106</creationdate><title>Impact of temperature on hard disk drive reliability in large datacenters</title><author>Sankar, S. ; Shaw, M. ; Vaid, K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c222t-c60326901b22eae94d7a5646c95ea83dfef69037b197201eb2d4b992a7bfdfcb3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Correlation</topic><topic>Datacenter</topic><topic>Disk drives</topic><topic>Drives</topic><topic>Hard Disk Drive</topic><topic>Hard disks</topic><topic>Reliability</topic><topic>Servers</topic><topic>Temperature</topic><topic>Temperature measurement</topic><toplevel>online_resources</toplevel><creatorcontrib>Sankar, S.</creatorcontrib><creatorcontrib>Shaw, M.</creatorcontrib><creatorcontrib>Vaid, K.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sankar, S.</au><au>Shaw, M.</au><au>Vaid, K.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Impact of temperature on hard disk drive reliability in large datacenters</atitle><btitle>2011 IEEE/IFIP 41st International Conference on Dependable Systems &amp; Networks (DSN)</btitle><stitle>DSN</stitle><date>2011-06</date><risdate>2011</risdate><spage>530</spage><epage>537</epage><pages>530-537</pages><issn>1530-0889</issn><eissn>2158-3927</eissn><isbn>9781424492329</isbn><isbn>1424492327</isbn><eisbn>9781424492336</eisbn><eisbn>1424492319</eisbn><eisbn>9781424492312</eisbn><eisbn>1424492335</eisbn><abstract>When datacenters are pushed to their limits of operational efficiency, reducing failure rates becomes critical for maintaining high levels of healthy server operation. In this experience report, we present a dense storage case study from a large population of servers housing tens of thousands of disk drives. Previous studies have presented divergent results concerning correlation between temperature and hard disk drive failures. In our paper, we specifically establish correlation between temperatures and failures observed at different location granularities: a) inside drive locations in a server chassis, b) across server locations in a rack and c) across multiple racks in a datacenter. We also establish that temperature exhibits a stronger correlation to failures compared to the correlation of disk utilization with drive failures. Thus, we show that temperature-aware server and datacenter design plays a pivotal role in datacenter reliability. Following our case study, we present a reliability model for estimating hard disk drive failures correlated with the datacenter operating temperature. We use a physical Arrhenius model with empirically derived coefficients for our model. We show an application of the model for selecting the datacenter inlet temperature setpoint for two different server storage configurations. Finally, with the help of a datacenter cost discussion, we highlight the need to incorporate reliability-aware datacenter design for increased efficiency in large scale datacenters.</abstract><pub>IEEE</pub><doi>10.1109/DSN.2011.5958265</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1530-0889
ispartof 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011, p.530-537
issn 1530-0889
2158-3927
language eng
recordid cdi_ieee_primary_5958265
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Correlation
Datacenter
Disk drives
Drives
Hard Disk Drive
Hard disks
Reliability
Servers
Temperature
Temperature measurement
title Impact of temperature on hard disk drive reliability in large datacenters
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T14%3A59%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Impact%20of%20temperature%20on%20hard%20disk%20drive%20reliability%20in%20large%20datacenters&rft.btitle=2011%20IEEE/IFIP%2041st%20International%20Conference%20on%20Dependable%20Systems%20&%20Networks%20(DSN)&rft.au=Sankar,%20S.&rft.date=2011-06&rft.spage=530&rft.epage=537&rft.pages=530-537&rft.issn=1530-0889&rft.eissn=2158-3927&rft.isbn=9781424492329&rft.isbn_list=1424492327&rft_id=info:doi/10.1109/DSN.2011.5958265&rft_dat=%3Cieee_6IE%3E5958265%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424492336&rft.eisbn_list=1424492319&rft.eisbn_list=9781424492312&rft.eisbn_list=1424492335&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5958265&rfr_iscdi=true