Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning
In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Qasem, A. Cade, M. J. Tamir, D. |
description | In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance. |
doi_str_mv | 10.1109/GREEN.2012.6200963 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6200963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6200963</ieee_id><sourcerecordid>6200963</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-75b31e050fd545dce25627501a6dd0a8efa916e90348f8221e586b8dafa2f87c3</originalsourceid><addsrcrecordid>eNpFkM1OwzAQhM2fRCl9AbjkBVLWTvx3LFUoFS1I0AO3yo3XqVGaVE6ClLcnEhXMZVbzjfYwhNxRmFIK-mHxnmWvUwaUTQUD0CI5Izc0FTIZbsnOyYhRIWKeSnXxDxRc_gHxeU0mTfMFg6TUKaMj8rE8HEP9jTbKKgxFH2XO-dxjlfeRq0O07srWt_uAxg6dFwwVlk00BHVX7KN1bbGMH00zsFnX1m1X-aq4JVfOlA1OTj4mm6dsM3-OV2-L5Xy2ir2GNpZ8l1AEDs7ylNscGRdMcqBGWAtGoTOaCtSQpMopxihyJXbKGmeYUzJPxuT-961HxO0x-IMJ_fa0TfIDivFU4A</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Qasem, A. ; Cade, M. J. ; Tamir, D.</creator><creatorcontrib>Qasem, A. ; Cade, M. J. ; Tamir, D.</creatorcontrib><description>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</description><identifier>ISSN: 2166-546X</identifier><identifier>ISBN: 1467309680</identifier><identifier>ISBN: 9781467309684</identifier><identifier>EISSN: 2166-5478</identifier><identifier>EISBN: 1467309672</identifier><identifier>EISBN: 9781467309660</identifier><identifier>EISBN: 1467309664</identifier><identifier>EISBN: 9781467309677</identifier><identifier>DOI: 10.1109/GREEN.2012.6200963</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational modeling ; Energy consumption ; Multicore processing ; Parallel processing ; Power demand ; Synchronization ; Tuning</subject><ispartof>2012 IEEE Green Technologies Conference, 2012, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6200963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6200963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Qasem, A.</creatorcontrib><creatorcontrib>Cade, M. J.</creatorcontrib><creatorcontrib>Tamir, D.</creatorcontrib><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><title>2012 IEEE Green Technologies Conference</title><addtitle>GREEN</addtitle><description>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</description><subject>Computational modeling</subject><subject>Energy consumption</subject><subject>Multicore processing</subject><subject>Parallel processing</subject><subject>Power demand</subject><subject>Synchronization</subject><subject>Tuning</subject><issn>2166-546X</issn><issn>2166-5478</issn><isbn>1467309680</isbn><isbn>9781467309684</isbn><isbn>1467309672</isbn><isbn>9781467309660</isbn><isbn>1467309664</isbn><isbn>9781467309677</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFkM1OwzAQhM2fRCl9AbjkBVLWTvx3LFUoFS1I0AO3yo3XqVGaVE6ClLcnEhXMZVbzjfYwhNxRmFIK-mHxnmWvUwaUTQUD0CI5Izc0FTIZbsnOyYhRIWKeSnXxDxRc_gHxeU0mTfMFg6TUKaMj8rE8HEP9jTbKKgxFH2XO-dxjlfeRq0O07srWt_uAxg6dFwwVlk00BHVX7KN1bbGMH00zsFnX1m1X-aq4JVfOlA1OTj4mm6dsM3-OV2-L5Xy2ir2GNpZ8l1AEDs7ylNscGRdMcqBGWAtGoTOaCtSQpMopxihyJXbKGmeYUzJPxuT-961HxO0x-IMJ_fa0TfIDivFU4A</recordid><startdate>201204</startdate><enddate>201204</enddate><creator>Qasem, A.</creator><creator>Cade, M. J.</creator><creator>Tamir, D.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201204</creationdate><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><author>Qasem, A. ; Cade, M. J. ; Tamir, D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-75b31e050fd545dce25627501a6dd0a8efa916e90348f8221e586b8dafa2f87c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computational modeling</topic><topic>Energy consumption</topic><topic>Multicore processing</topic><topic>Parallel processing</topic><topic>Power demand</topic><topic>Synchronization</topic><topic>Tuning</topic><toplevel>online_resources</toplevel><creatorcontrib>Qasem, A.</creatorcontrib><creatorcontrib>Cade, M. J.</creatorcontrib><creatorcontrib>Tamir, D.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qasem, A.</au><au>Cade, M. J.</au><au>Tamir, D.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</atitle><btitle>2012 IEEE Green Technologies Conference</btitle><stitle>GREEN</stitle><date>2012-04</date><risdate>2012</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><issn>2166-546X</issn><eissn>2166-5478</eissn><isbn>1467309680</isbn><isbn>9781467309684</isbn><eisbn>1467309672</eisbn><eisbn>9781467309660</eisbn><eisbn>1467309664</eisbn><eisbn>9781467309677</eisbn><abstract>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</abstract><pub>IEEE</pub><doi>10.1109/GREEN.2012.6200963</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2166-546X |
ispartof | 2012 IEEE Green Technologies Conference, 2012, p.1-6 |
issn | 2166-546X 2166-5478 |
language | eng |
recordid | cdi_ieee_primary_6200963 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Computational modeling Energy consumption Multicore processing Parallel processing Power demand Synchronization Tuning |
title | Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T08%3A04%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Improved%20Energy%20Efficiency%20for%20Multithreaded%20Kernels%20through%20Model-Based%20Autotuning&rft.btitle=2012%20IEEE%20Green%20Technologies%20Conference&rft.au=Qasem,%20A.&rft.date=2012-04&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.issn=2166-546X&rft.eissn=2166-5478&rft.isbn=1467309680&rft.isbn_list=9781467309684&rft_id=info:doi/10.1109/GREEN.2012.6200963&rft_dat=%3Cieee_6IE%3E6200963%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467309672&rft.eisbn_list=9781467309660&rft.eisbn_list=1467309664&rft.eisbn_list=9781467309677&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6200963&rfr_iscdi=true |