Exploiting data-width locality to increase superscalar execution bandwidth
In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 405 |
---|---|
container_issue | |
container_start_page | 395 |
container_title | |
container_volume | |
creator | Loh, G.H. |
description | In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor. |
doi_str_mv | 10.1109/MICRO.2002.1176266 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1176266</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1176266</ieee_id><sourcerecordid>1176266</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-713d72ad9b1a49e64f0d1b372964adb1114c0d9a18c36c470536e321cc3411f83</originalsourceid><addsrcrecordid>eNotj1FLwzAUhQMqOOf-gL7kD3Tem6RJ8yhl6mQyEH0et0mqkdqWNsPt31t0T4fDx3fgMHaDsEQEe_eyLl-3SwEgpm600PqMLawpwGibY5Fbfc5mCEZkSuV4ya7G8QsAionO2PPq0DddTLH94J4SZT_Rp0_edI6amI48dTy2bgg0Bj7u-zCME6CBh0Nw-xS7llfU-j_pml3U1Ixhcco5e39YvZVP2Wb7uC7vN1lEk6fMoPRGkLcVkrJBqxo8VtIIqxX5ChGVA28JCye1UwZyqYMU6JxUiHUh5-z2fzeGEHb9EL9pOO5O1-UvNkpN1w</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Exploiting data-width locality to increase superscalar execution bandwidth</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Loh, G.H.</creator><creatorcontrib>Loh, G.H.</creatorcontrib><description>In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.</description><identifier>ISSN: 1072-4451</identifier><identifier>ISBN: 9780769518596</identifier><identifier>ISBN: 0769518591</identifier><identifier>DOI: 10.1109/MICRO.2002.1176266</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; Clocks ; Computer science ; Delay effects ; Microarchitecture ; Pipelines ; Predictive models ; Processor scheduling ; Wires</subject><ispartof>35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings, 2002, p.395-405</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1176266$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,4036,4037,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1176266$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Loh, G.H.</creatorcontrib><title>Exploiting data-width locality to increase superscalar execution bandwidth</title><title>35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings</title><addtitle>MICRO</addtitle><description>In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.</description><subject>Bandwidth</subject><subject>Clocks</subject><subject>Computer science</subject><subject>Delay effects</subject><subject>Microarchitecture</subject><subject>Pipelines</subject><subject>Predictive models</subject><subject>Processor scheduling</subject><subject>Wires</subject><issn>1072-4451</issn><isbn>9780769518596</isbn><isbn>0769518591</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2002</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj1FLwzAUhQMqOOf-gL7kD3Tem6RJ8yhl6mQyEH0et0mqkdqWNsPt31t0T4fDx3fgMHaDsEQEe_eyLl-3SwEgpm600PqMLawpwGibY5Fbfc5mCEZkSuV4ya7G8QsAionO2PPq0DddTLH94J4SZT_Rp0_edI6amI48dTy2bgg0Bj7u-zCME6CBh0Nw-xS7llfU-j_pml3U1Ixhcco5e39YvZVP2Wb7uC7vN1lEk6fMoPRGkLcVkrJBqxo8VtIIqxX5ChGVA28JCye1UwZyqYMU6JxUiHUh5-z2fzeGEHb9EL9pOO5O1-UvNkpN1w</recordid><startdate>2002</startdate><enddate>2002</enddate><creator>Loh, G.H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2002</creationdate><title>Exploiting data-width locality to increase superscalar execution bandwidth</title><author>Loh, G.H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-713d72ad9b1a49e64f0d1b372964adb1114c0d9a18c36c470536e321cc3411f83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Bandwidth</topic><topic>Clocks</topic><topic>Computer science</topic><topic>Delay effects</topic><topic>Microarchitecture</topic><topic>Pipelines</topic><topic>Predictive models</topic><topic>Processor scheduling</topic><topic>Wires</topic><toplevel>online_resources</toplevel><creatorcontrib>Loh, G.H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Loh, G.H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Exploiting data-width locality to increase superscalar execution bandwidth</atitle><btitle>35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings</btitle><stitle>MICRO</stitle><date>2002</date><risdate>2002</risdate><spage>395</spage><epage>405</epage><pages>395-405</pages><issn>1072-4451</issn><isbn>9780769518596</isbn><isbn>0769518591</isbn><abstract>In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit very strong temporal locality and describe mechanisms to accurately predict data-widths. To exploit the predictability of data-widths, we propose a Multi-Bit-Width (MBW) microarchitecture which, when the opportunity arises, takes the wires normally used to route the operands and bypass the result of a 64-bit instruction, and instead uses them for multiple narrow-width instructions. This technique increases the effective issue width without adding many additional wires by reusing, already existing datapaths. Compared to a traditional four-wide superscalar processor our best MBW configuration with a peak issue rate of eight IPC achieves a 7.1% speedup on the simulated SPECint2000 benchmarks, which performs very well when compared to a 7.9% speedup attainable by a processor with a perfect data-width predictor.</abstract><pub>IEEE</pub><doi>10.1109/MICRO.2002.1176266</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1072-4451 |
ispartof | 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings, 2002, p.395-405 |
issn | 1072-4451 |
language | eng |
recordid | cdi_ieee_primary_1176266 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Bandwidth Clocks Computer science Delay effects Microarchitecture Pipelines Predictive models Processor scheduling Wires |
title | Exploiting data-width locality to increase superscalar execution bandwidth |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T09%3A23%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Exploiting%20data-width%20locality%20to%20increase%20superscalar%20execution%20bandwidth&rft.btitle=35th%20Annual%20IEEE/ACM%20International%20Symposium%20on%20Microarchitecture,%202002.%20(MICRO-35).%20Proceedings&rft.au=Loh,%20G.H.&rft.date=2002&rft.spage=395&rft.epage=405&rft.pages=395-405&rft.issn=1072-4451&rft.isbn=9780769518596&rft.isbn_list=0769518591&rft_id=info:doi/10.1109/MICRO.2002.1176266&rft_dat=%3Cieee_6IE%3E1176266%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1176266&rfr_iscdi=true |