Dynamically allocating processor resources between nearby and distant ILP

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Balasubramonian, R., Dwarkadas, S., Albonesi, D.H.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 37
container_issue
container_start_page 26
container_title
container_volume
creator Balasubramonian, R.
Dwarkadas, S.
Albonesi, D.H.
description Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.
doi_str_mv 10.1109/ISCA.2001.937428
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_937428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>937428</ieee_id><sourcerecordid>937428</sourcerecordid><originalsourceid>FETCH-ieee_primary_9374283</originalsourceid><addsrcrecordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Dynamically allocating processor resources between nearby and distant ILP</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creator><creatorcontrib>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creatorcontrib><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>EISSN: 2575-713X</identifier><identifier>DOI: 10.1109/ISCA.2001.937428</identifier><language>eng</language><publisher>IEEE</publisher><subject>Degradation ; Delay ; Hardware ; Microarchitecture ; Out of order ; Parallel processing ; Registers ; Resource management ; Time factors ; Yarn</subject><ispartof>Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/937428$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,4036,4037,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/937428$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><title>Dynamically allocating processor resources between nearby and distant ILP</title><title>Proceedings 28th Annual International Symposium on Computer Architecture</title><addtitle>ISCA</addtitle><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><subject>Degradation</subject><subject>Delay</subject><subject>Hardware</subject><subject>Microarchitecture</subject><subject>Out of order</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Resource management</subject><subject>Time factors</subject><subject>Yarn</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Balasubramonian, R.</creator><creator>Dwarkadas, S.</creator><creator>Albonesi, D.H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2001</creationdate><title>Dynamically allocating processor resources between nearby and distant ILP</title><author>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_9374283</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Degradation</topic><topic>Delay</topic><topic>Hardware</topic><topic>Microarchitecture</topic><topic>Out of order</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Resource management</topic><topic>Time factors</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Balasubramonian, R.</au><au>Dwarkadas, S.</au><au>Albonesi, D.H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Dynamically allocating processor resources between nearby and distant ILP</atitle><btitle>Proceedings 28th Annual International Symposium on Computer Architecture</btitle><stitle>ISCA</stitle><date>2001</date><risdate>2001</risdate><spage>26</spage><epage>37</epage><pages>26-37</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</abstract><pub>IEEE</pub><doi>10.1109/ISCA.2001.937428</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6897
ispartof Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37
issn 1063-6897
2575-713X
language eng
recordid cdi_ieee_primary_937428
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Degradation
Delay
Hardware
Microarchitecture
Out of order
Parallel processing
Registers
Resource management
Time factors
Yarn
title Dynamically allocating processor resources between nearby and distant ILP
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A39%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Dynamically%20allocating%20processor%20resources%20between%20nearby%20and%20distant%20ILP&rft.btitle=Proceedings%2028th%20Annual%20International%20Symposium%20on%20Computer%20Architecture&rft.au=Balasubramonian,%20R.&rft.date=2001&rft.spage=26&rft.epage=37&rft.pages=26-37&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1109/ISCA.2001.937428&rft_dat=%3Cieee_6IE%3E937428%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=937428&rfr_iscdi=true