Dynamically allocating processor resources between nearby and distant ILP
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such a...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 37 |
---|---|
container_issue | |
container_start_page | 26 |
container_title | |
container_volume | |
creator | Balasubramonian, R. Dwarkadas, S. Albonesi, D.H. |
description | Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64. |
doi_str_mv | 10.1109/ISCA.2001.937428 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_937428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>937428</ieee_id><sourcerecordid>937428</sourcerecordid><originalsourceid>FETCH-ieee_primary_9374283</originalsourceid><addsrcrecordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Dynamically allocating processor resources between nearby and distant ILP</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creator><creatorcontrib>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creatorcontrib><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>EISSN: 2575-713X</identifier><identifier>DOI: 10.1109/ISCA.2001.937428</identifier><language>eng</language><publisher>IEEE</publisher><subject>Degradation ; Delay ; Hardware ; Microarchitecture ; Out of order ; Parallel processing ; Registers ; Resource management ; Time factors ; Yarn</subject><ispartof>Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/937428$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,4036,4037,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/937428$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><title>Dynamically allocating processor resources between nearby and distant ILP</title><title>Proceedings 28th Annual International Symposium on Computer Architecture</title><addtitle>ISCA</addtitle><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><subject>Degradation</subject><subject>Delay</subject><subject>Hardware</subject><subject>Microarchitecture</subject><subject>Out of order</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Resource management</subject><subject>Time factors</subject><subject>Yarn</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Balasubramonian, R.</creator><creator>Dwarkadas, S.</creator><creator>Albonesi, D.H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2001</creationdate><title>Dynamically allocating processor resources between nearby and distant ILP</title><author>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_9374283</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Degradation</topic><topic>Delay</topic><topic>Hardware</topic><topic>Microarchitecture</topic><topic>Out of order</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Resource management</topic><topic>Time factors</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Balasubramonian, R.</au><au>Dwarkadas, S.</au><au>Albonesi, D.H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Dynamically allocating processor resources between nearby and distant ILP</atitle><btitle>Proceedings 28th Annual International Symposium on Computer Architecture</btitle><stitle>ISCA</stitle><date>2001</date><risdate>2001</risdate><spage>26</spage><epage>37</epage><pages>26-37</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</abstract><pub>IEEE</pub><doi>10.1109/ISCA.2001.937428</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1063-6897 |
ispartof | Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37 |
issn | 1063-6897 2575-713X |
language | eng |
recordid | cdi_ieee_primary_937428 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Degradation Delay Hardware Microarchitecture Out of order Parallel processing Registers Resource management Time factors Yarn |
title | Dynamically allocating processor resources between nearby and distant ILP |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A39%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Dynamically%20allocating%20processor%20resources%20between%20nearby%20and%20distant%20ILP&rft.btitle=Proceedings%2028th%20Annual%20International%20Symposium%20on%20Computer%20Architecture&rft.au=Balasubramonian,%20R.&rft.date=2001&rft.spage=26&rft.epage=37&rft.pages=26-37&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1109/ISCA.2001.937428&rft_dat=%3Cieee_6IE%3E937428%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=937428&rfr_iscdi=true |