Author retrospective for software trace cache

In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. In...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ramirez, Alex, Falcon, Ayose J., Santana, Oliverio J., Valero, Mateo
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 47
container_issue
container_start_page 45
container_title
container_volume
creator Ramirez, Alex
Falcon, Ayose J.
Santana, Oliverio J.
Valero, Mateo
description In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.
doi_str_mv 10.1145/2591635.2594508
format Conference Proceeding
fullrecord <record><control><sourceid>csuc_XX2</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_250711</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_recercat_cat_2072_250711</sourcerecordid><originalsourceid>FETCH-LOGICAL-a2258-6080b0e3daca9a14b8e00a8d0539b83088ea6d582c8917c63f06f6f34b050dd43</originalsourceid><addsrcrecordid>eNqNkDtLA0EUhQdEUGNq2y1tNt55z5Yh-IKAjdbDndk7ZH0wMrPRv--GLNimuBzuB98pDmM3HFacK30ndMeN1KsplQZ3xq4mClI4BfaCLWt9B4AJaSvtJWvX-3GXS1NoLLl-UxyHH2rSRGpO4y8WasaCkZqIcUfX7DzhZ6XlnAv29nD_unlqty-Pz5v1tkUhtGsNOAhAsseIHXIVHAGg60HLLjgJzhGaXjsRXcdtNDKBSSZJFUBD3yu5YPzYG-s--kKRSsTRZxz-n8MJsMILDZbzyVkdHYxfPuT8UT0Hf5jEz5P4eRIfykBpEm5PFOQfohZgog</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Author retrospective for software trace cache</title><source>Recercat</source><creator>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo</creator><contributor>Banerjee, Utpal</contributor><creatorcontrib>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo ; Banerjee, Utpal</creatorcontrib><description>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</description><identifier>ISBN: 1450328407</identifier><identifier>ISBN: 9781450328401</identifier><identifier>ISBN: 9781450326421</identifier><identifier>ISBN: 1450326420</identifier><identifier>DOI: 10.1145/2591635.2594508</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Binary trabslation ; Compiladors (Programes d'ordinador) ; Compilers (Computer programs) ; ILP ; Informàtica ; Instruction fetch ; Llenguatges de programació ; Programming languages (Electronic computers) ; Software and its engineering -- Software notations and tools -- Compilers ; Software and its engineering -- Software notations and tools -- Compilers -- Source code generation ; Superscalar processors ; Àrees temàtiques de la UPC</subject><ispartof>ACM International Conference on Supercomputing 25th Anniversary Volume, 2014, p.45-47</ispartof><rights>2014 Owner/Author</rights><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,309,310,776,881,26953</link.rule.ids><linktorsrc>$$Uhttps://recercat.cat/handle/2072/250711$$EView_record_in_Consorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$FView_record_in_$$GConsorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$Hfree_for_read</linktorsrc></links><search><contributor>Banerjee, Utpal</contributor><creatorcontrib>Ramirez, Alex</creatorcontrib><creatorcontrib>Falcon, Ayose J.</creatorcontrib><creatorcontrib>Santana, Oliverio J.</creatorcontrib><creatorcontrib>Valero, Mateo</creatorcontrib><title>Author retrospective for software trace cache</title><title>ACM International Conference on Supercomputing 25th Anniversary Volume</title><description>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</description><subject>Binary trabslation</subject><subject>Compiladors (Programes d'ordinador)</subject><subject>Compilers (Computer programs)</subject><subject>ILP</subject><subject>Informàtica</subject><subject>Instruction fetch</subject><subject>Llenguatges de programació</subject><subject>Programming languages (Electronic computers)</subject><subject>Software and its engineering -- Software notations and tools -- Compilers</subject><subject>Software and its engineering -- Software notations and tools -- Compilers -- Source code generation</subject><subject>Superscalar processors</subject><subject>Àrees temàtiques de la UPC</subject><isbn>1450328407</isbn><isbn>9781450328401</isbn><isbn>9781450326421</isbn><isbn>1450326420</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2014</creationdate><recordtype>conference_proceeding</recordtype><sourceid>XX2</sourceid><recordid>eNqNkDtLA0EUhQdEUGNq2y1tNt55z5Yh-IKAjdbDndk7ZH0wMrPRv--GLNimuBzuB98pDmM3HFacK30ndMeN1KsplQZ3xq4mClI4BfaCLWt9B4AJaSvtJWvX-3GXS1NoLLl-UxyHH2rSRGpO4y8WasaCkZqIcUfX7DzhZ6XlnAv29nD_unlqty-Pz5v1tkUhtGsNOAhAsseIHXIVHAGg60HLLjgJzhGaXjsRXcdtNDKBSSZJFUBD3yu5YPzYG-s--kKRSsTRZxz-n8MJsMILDZbzyVkdHYxfPuT8UT0Hf5jEz5P4eRIfykBpEm5PFOQfohZgog</recordid><startdate>20140610</startdate><enddate>20140610</enddate><creator>Ramirez, Alex</creator><creator>Falcon, Ayose J.</creator><creator>Santana, Oliverio J.</creator><creator>Valero, Mateo</creator><general>ACM</general><general>Association for Computing Machinery (ACM)</general><scope>XX2</scope></search><sort><creationdate>20140610</creationdate><title>Author retrospective for software trace cache</title><author>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a2258-6080b0e3daca9a14b8e00a8d0539b83088ea6d582c8917c63f06f6f34b050dd43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Binary trabslation</topic><topic>Compiladors (Programes d'ordinador)</topic><topic>Compilers (Computer programs)</topic><topic>ILP</topic><topic>Informàtica</topic><topic>Instruction fetch</topic><topic>Llenguatges de programació</topic><topic>Programming languages (Electronic computers)</topic><topic>Software and its engineering -- Software notations and tools -- Compilers</topic><topic>Software and its engineering -- Software notations and tools -- Compilers -- Source code generation</topic><topic>Superscalar processors</topic><topic>Àrees temàtiques de la UPC</topic><toplevel>online_resources</toplevel><creatorcontrib>Ramirez, Alex</creatorcontrib><creatorcontrib>Falcon, Ayose J.</creatorcontrib><creatorcontrib>Santana, Oliverio J.</creatorcontrib><creatorcontrib>Valero, Mateo</creatorcontrib><collection>Recercat</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ramirez, Alex</au><au>Falcon, Ayose J.</au><au>Santana, Oliverio J.</au><au>Valero, Mateo</au><au>Banerjee, Utpal</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Author retrospective for software trace cache</atitle><btitle>ACM International Conference on Supercomputing 25th Anniversary Volume</btitle><date>2014-06-10</date><risdate>2014</risdate><spage>45</spage><epage>47</epage><pages>45-47</pages><isbn>1450328407</isbn><isbn>9781450328401</isbn><isbn>9781450326421</isbn><isbn>1450326420</isbn><abstract>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2591635.2594508</doi><tpages>3</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1450328407
ispartof ACM International Conference on Supercomputing 25th Anniversary Volume, 2014, p.45-47
issn
language eng
recordid cdi_csuc_recercat_oai_recercat_cat_2072_250711
source Recercat
subjects Binary trabslation
Compiladors (Programes d'ordinador)
Compilers (Computer programs)
ILP
Informàtica
Instruction fetch
Llenguatges de programació
Programming languages (Electronic computers)
Software and its engineering -- Software notations and tools -- Compilers
Software and its engineering -- Software notations and tools -- Compilers -- Source code generation
Superscalar processors
Àrees temàtiques de la UPC
title Author retrospective for software trace cache
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T11%3A53%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-csuc_XX2&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Author%20retrospective%20for%20software%20trace%20cache&rft.btitle=ACM%20International%20Conference%20on%20Supercomputing%2025th%20Anniversary%20Volume&rft.au=Ramirez,%20Alex&rft.date=2014-06-10&rft.spage=45&rft.epage=47&rft.pages=45-47&rft.isbn=1450328407&rft.isbn_list=9781450328401&rft.isbn_list=9781450326421&rft.isbn_list=1450326420&rft_id=info:doi/10.1145/2591635.2594508&rft_dat=%3Ccsuc_XX2%3Eoai_recercat_cat_2072_250711%3C/csuc_XX2%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true