Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning

In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Qasem, A., Cade, M. J., Tamir, D.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6
container_issue
container_start_page 1
container_title
container_volume
creator Qasem, A.
Cade, M. J.
Tamir, D.
description In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.
doi_str_mv 10.1109/GREEN.2012.6200963
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6200963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6200963</ieee_id><sourcerecordid>6200963</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-75b31e050fd545dce25627501a6dd0a8efa916e90348f8221e586b8dafa2f87c3</originalsourceid><addsrcrecordid>eNpFkM1OwzAQhM2fRCl9AbjkBVLWTvx3LFUoFS1I0AO3yo3XqVGaVE6ClLcnEhXMZVbzjfYwhNxRmFIK-mHxnmWvUwaUTQUD0CI5Izc0FTIZbsnOyYhRIWKeSnXxDxRc_gHxeU0mTfMFg6TUKaMj8rE8HEP9jTbKKgxFH2XO-dxjlfeRq0O07srWt_uAxg6dFwwVlk00BHVX7KN1bbGMH00zsFnX1m1X-aq4JVfOlA1OTj4mm6dsM3-OV2-L5Xy2ir2GNpZ8l1AEDs7ylNscGRdMcqBGWAtGoTOaCtSQpMopxihyJXbKGmeYUzJPxuT-961HxO0x-IMJ_fa0TfIDivFU4A</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Qasem, A. ; Cade, M. J. ; Tamir, D.</creator><creatorcontrib>Qasem, A. ; Cade, M. J. ; Tamir, D.</creatorcontrib><description>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</description><identifier>ISSN: 2166-546X</identifier><identifier>ISBN: 1467309680</identifier><identifier>ISBN: 9781467309684</identifier><identifier>EISSN: 2166-5478</identifier><identifier>EISBN: 1467309672</identifier><identifier>EISBN: 9781467309660</identifier><identifier>EISBN: 1467309664</identifier><identifier>EISBN: 9781467309677</identifier><identifier>DOI: 10.1109/GREEN.2012.6200963</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational modeling ; Energy consumption ; Multicore processing ; Parallel processing ; Power demand ; Synchronization ; Tuning</subject><ispartof>2012 IEEE Green Technologies Conference, 2012, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6200963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6200963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Qasem, A.</creatorcontrib><creatorcontrib>Cade, M. J.</creatorcontrib><creatorcontrib>Tamir, D.</creatorcontrib><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><title>2012 IEEE Green Technologies Conference</title><addtitle>GREEN</addtitle><description>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</description><subject>Computational modeling</subject><subject>Energy consumption</subject><subject>Multicore processing</subject><subject>Parallel processing</subject><subject>Power demand</subject><subject>Synchronization</subject><subject>Tuning</subject><issn>2166-546X</issn><issn>2166-5478</issn><isbn>1467309680</isbn><isbn>9781467309684</isbn><isbn>1467309672</isbn><isbn>9781467309660</isbn><isbn>1467309664</isbn><isbn>9781467309677</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFkM1OwzAQhM2fRCl9AbjkBVLWTvx3LFUoFS1I0AO3yo3XqVGaVE6ClLcnEhXMZVbzjfYwhNxRmFIK-mHxnmWvUwaUTQUD0CI5Izc0FTIZbsnOyYhRIWKeSnXxDxRc_gHxeU0mTfMFg6TUKaMj8rE8HEP9jTbKKgxFH2XO-dxjlfeRq0O07srWt_uAxg6dFwwVlk00BHVX7KN1bbGMH00zsFnX1m1X-aq4JVfOlA1OTj4mm6dsM3-OV2-L5Xy2ir2GNpZ8l1AEDs7ylNscGRdMcqBGWAtGoTOaCtSQpMopxihyJXbKGmeYUzJPxuT-961HxO0x-IMJ_fa0TfIDivFU4A</recordid><startdate>201204</startdate><enddate>201204</enddate><creator>Qasem, A.</creator><creator>Cade, M. J.</creator><creator>Tamir, D.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201204</creationdate><title>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</title><author>Qasem, A. ; Cade, M. J. ; Tamir, D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-75b31e050fd545dce25627501a6dd0a8efa916e90348f8221e586b8dafa2f87c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Computational modeling</topic><topic>Energy consumption</topic><topic>Multicore processing</topic><topic>Parallel processing</topic><topic>Power demand</topic><topic>Synchronization</topic><topic>Tuning</topic><toplevel>online_resources</toplevel><creatorcontrib>Qasem, A.</creatorcontrib><creatorcontrib>Cade, M. J.</creatorcontrib><creatorcontrib>Tamir, D.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qasem, A.</au><au>Cade, M. J.</au><au>Tamir, D.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning</atitle><btitle>2012 IEEE Green Technologies Conference</btitle><stitle>GREEN</stitle><date>2012-04</date><risdate>2012</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><issn>2166-546X</issn><eissn>2166-5478</eissn><isbn>1467309680</isbn><isbn>9781467309684</isbn><eisbn>1467309672</eisbn><eisbn>9781467309660</eisbn><eisbn>1467309664</eisbn><eisbn>9781467309677</eisbn><abstract>In the last few years, the emergence of multicore architectures has revolutionized the landscape of high-performance computing. The multicore shift has not only increased the per-node performance potential of computer systems but also has made great strides in curbing power and heat dissipation. As we look to the future, however, the gains in performance and energy consumption is not going to come from hardware alone. Software needs to play a key role in achieving a high fraction of peak and keeping the energy consumption within the desired envelope. To attain this goal, performance-enhancing and energy-conserving software needs to carefully orchestrate many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality to achieve the desired power-performance ratio. This paper studies the complex interaction among several code transformations that affect data locality, problem decomposition and selection of loops for parallelism. We characterize this interaction using static compiler analysis and generate a pruned search space suitable for efficient autotuning. We also extend a heuristic based on number of threads, data reuse patterns, and the size and configuration of the shared cache, to estimate good synchronization interval for conserving energy in parallel code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific and engineering kernels on four different multicore platforms. Results of the experimental study reveal several interesting properties of the transformation search space and demonstrate the effectiveness of the heuristic in predicting good synchronization intervals that reduce energy consumption without a significant degradation in performance.</abstract><pub>IEEE</pub><doi>10.1109/GREEN.2012.6200963</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2166-546X
ispartof 2012 IEEE Green Technologies Conference, 2012, p.1-6
issn 2166-546X
2166-5478
language eng
recordid cdi_ieee_primary_6200963
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Computational modeling
Energy consumption
Multicore processing
Parallel processing
Power demand
Synchronization
Tuning
title Improved Energy Efficiency for Multithreaded Kernels through Model-Based Autotuning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T08%3A04%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Improved%20Energy%20Efficiency%20for%20Multithreaded%20Kernels%20through%20Model-Based%20Autotuning&rft.btitle=2012%20IEEE%20Green%20Technologies%20Conference&rft.au=Qasem,%20A.&rft.date=2012-04&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.issn=2166-546X&rft.eissn=2166-5478&rft.isbn=1467309680&rft.isbn_list=9781467309684&rft_id=info:doi/10.1109/GREEN.2012.6200963&rft_dat=%3Cieee_6IE%3E6200963%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467309672&rft.eisbn_list=9781467309660&rft.eisbn_list=1467309664&rft.eisbn_list=9781467309677&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6200963&rfr_iscdi=true