Dynamically allocating processor resources between nearby and distant ILP

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Balasubramonian, R., Dwarkadas, S., Albonesi, D.H.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Degradation Delay Hardware Microarchitecture Out of order Parallel processing Registers Resource management Time factors Yarn
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	37
container_issue
container_start_page	26
container_title
container_volume
creator	Balasubramonian, R. Dwarkadas, S. Albonesi, D.H.
description	Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.
doi_str_mv	10.1109/ISCA.2001.937428
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_937428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>937428</ieee_id><sourcerecordid>937428</sourcerecordid><originalsourceid>FETCH-ieee_primary_9374283</originalsourceid><addsrcrecordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Dynamically allocating processor resources between nearby and distant ILP</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creator><creatorcontrib>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</creatorcontrib><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><identifier>ISSN: 1063-6897</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>EISSN: 2575-713X</identifier><identifier>DOI: 10.1109/ISCA.2001.937428</identifier><language>eng</language><publisher>IEEE</publisher><subject>Degradation ; Delay ; Hardware ; Microarchitecture ; Out of order ; Parallel processing ; Registers ; Resource management ; Time factors ; Yarn</subject><ispartof>Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/937428$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,4036,4037,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/937428$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><title>Dynamically allocating processor resources between nearby and distant ILP</title><title>Proceedings 28th Annual International Symposium on Computer Architecture</title><addtitle>ISCA</addtitle><description>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</description><subject>Degradation</subject><subject>Delay</subject><subject>Hardware</subject><subject>Microarchitecture</subject><subject>Out of order</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Resource management</subject><subject>Time factors</subject><subject>Yarn</subject><issn>1063-6897</issn><issn>2575-713X</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNp9jj0LwjAYhF_8AFt1F6f8gdaktYkZxQ8sOAg6uJW0vkqkppJUpP_egs4udxzPHRzAhNGQMSpn6XG1DCNKWShjMY8WHfCiRCSBYPG5Cz4VXCaM8Uj0wGOUxwFfSDEA37l7O5Iy4R6k68aohy5UWTaklapQtTY38rRVgc5Vllh01cu2geRYvxENMahs3rbNhVy0q5WpSbo_jKB_VaXD8c-HMN1uTqtdoBExe1r9ULbJvk_jv_ADhzA_VA</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Balasubramonian, R.</creator><creator>Dwarkadas, S.</creator><creator>Albonesi, D.H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2001</creationdate><title>Dynamically allocating processor resources between nearby and distant ILP</title><author>Balasubramonian, R. ; Dwarkadas, S. ; Albonesi, D.H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_9374283</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Degradation</topic><topic>Delay</topic><topic>Hardware</topic><topic>Microarchitecture</topic><topic>Out of order</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Resource management</topic><topic>Time factors</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Balasubramonian, R.</creatorcontrib><creatorcontrib>Dwarkadas, S.</creatorcontrib><creatorcontrib>Albonesi, D.H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Balasubramonian, R.</au><au>Dwarkadas, S.</au><au>Albonesi, D.H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Dynamically allocating processor resources between nearby and distant ILP</atitle><btitle>Proceedings 28th Annual International Symposium on Computer Architecture</btitle><stitle>ISCA</stitle><date>2001</date><risdate>2001</risdate><spage>26</spage><epage>37</epage><pages>26-37</pages><issn>1063-6897</issn><eissn>2575-713X</eissn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements. In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is not constrained by in-order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get an overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.</abstract><pub>IEEE</pub><doi>10.1109/ISCA.2001.937428</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6897
ispartof	Proceedings 28th Annual International Symposium on Computer Architecture, 2001, p.26-37
issn	1063-6897 2575-713X
language	eng
recordid	cdi_ieee_primary_937428
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Degradation Delay Hardware Microarchitecture Out of order Parallel processing Registers Resource management Time factors Yarn
title	Dynamically allocating processor resources between nearby and distant ILP
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A39%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Dynamically%20allocating%20processor%20resources%20between%20nearby%20and%20distant%20ILP&rft.btitle=Proceedings%2028th%20Annual%20International%20Symposium%20on%20Computer%20Architecture&rft.au=Balasubramonian,%20R.&rft.date=2001&rft.spage=26&rft.epage=37&rft.pages=26-37&rft.issn=1063-6897&rft.eissn=2575-713X&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1109/ISCA.2001.937428&rft_dat=%3Cieee_6IE%3E937428%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=937428&rfr_iscdi=true