Author retrospective for software trace cache

In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. In...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ramirez, Alex, Falcon, Ayose J., Santana, Oliverio J., Valero, Mateo
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Binary trabslation Compiladors (Programes d'ordinador) Compilers (Computer programs) ILP Informàtica Instruction fetch Llenguatges de programació Programming languages (Electronic computers) Software and its engineering > Software notations and tools > Compilers Software and its engineering > Software notations and tools > Compilers > Source code generation Superscalar processors Àrees temàtiques de la UPC
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	47
container_issue
container_start_page	45
container_title
container_volume
creator	Ramirez, Alex Falcon, Ayose J. Santana, Oliverio J. Valero, Mateo
description	In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.
doi_str_mv	10.1145/2591635.2594508
format	Conference Proceeding
fullrecord	<record><control><sourceid>csuc_XX2</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_250711</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_recercat_cat_2072_250711</sourcerecordid><originalsourceid>FETCH-LOGICAL-a2258-6080b0e3daca9a14b8e00a8d0539b83088ea6d582c8917c63f06f6f34b050dd43</originalsourceid><addsrcrecordid>eNqNkDtLA0EUhQdEUGNq2y1tNt55z5Yh-IKAjdbDndk7ZH0wMrPRv--GLNimuBzuB98pDmM3HFacK30ndMeN1KsplQZ3xq4mClI4BfaCLWt9B4AJaSvtJWvX-3GXS1NoLLl-UxyHH2rSRGpO4y8WasaCkZqIcUfX7DzhZ6XlnAv29nD_unlqty-Pz5v1tkUhtGsNOAhAsseIHXIVHAGg60HLLjgJzhGaXjsRXcdtNDKBSSZJFUBD3yu5YPzYG-s--kKRSsTRZxz-n8MJsMILDZbzyVkdHYxfPuT8UT0Hf5jEz5P4eRIfykBpEm5PFOQfohZgog</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Author retrospective for software trace cache</title><source>Recercat</source><creator>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo</creator><contributor>Banerjee, Utpal</contributor><creatorcontrib>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo ; Banerjee, Utpal</creatorcontrib><description>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</description><identifier>ISBN: 1450328407</identifier><identifier>ISBN: 9781450328401</identifier><identifier>ISBN: 9781450326421</identifier><identifier>ISBN: 1450326420</identifier><identifier>DOI: 10.1145/2591635.2594508</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Binary trabslation ; Compiladors (Programes d'ordinador) ; Compilers (Computer programs) ; ILP ; Informàtica ; Instruction fetch ; Llenguatges de programació ; Programming languages (Electronic computers) ; Software and its engineering -- Software notations and tools -- Compilers ; Software and its engineering -- Software notations and tools -- Compilers -- Source code generation ; Superscalar processors ; Àrees temàtiques de la UPC</subject><ispartof>ACM International Conference on Supercomputing 25th Anniversary Volume, 2014, p.45-47</ispartof><rights>2014 Owner/Author</rights><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,309,310,776,881,26953</link.rule.ids><linktorsrc>$$Uhttps://recercat.cat/handle/2072/250711$$EView_record_in_Consorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$FView_record_in_$$GConsorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$Hfree_for_read</linktorsrc></links><search><contributor>Banerjee, Utpal</contributor><creatorcontrib>Ramirez, Alex</creatorcontrib><creatorcontrib>Falcon, Ayose J.</creatorcontrib><creatorcontrib>Santana, Oliverio J.</creatorcontrib><creatorcontrib>Valero, Mateo</creatorcontrib><title>Author retrospective for software trace cache</title><title>ACM International Conference on Supercomputing 25th Anniversary Volume</title><description>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</description><subject>Binary trabslation</subject><subject>Compiladors (Programes d'ordinador)</subject><subject>Compilers (Computer programs)</subject><subject>ILP</subject><subject>Informàtica</subject><subject>Instruction fetch</subject><subject>Llenguatges de programació</subject><subject>Programming languages (Electronic computers)</subject><subject>Software and its engineering -- Software notations and tools -- Compilers</subject><subject>Software and its engineering -- Software notations and tools -- Compilers -- Source code generation</subject><subject>Superscalar processors</subject><subject>Àrees temàtiques de la UPC</subject><isbn>1450328407</isbn><isbn>9781450328401</isbn><isbn>9781450326421</isbn><isbn>1450326420</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2014</creationdate><recordtype>conference_proceeding</recordtype><sourceid>XX2</sourceid><recordid>eNqNkDtLA0EUhQdEUGNq2y1tNt55z5Yh-IKAjdbDndk7ZH0wMrPRv--GLNimuBzuB98pDmM3HFacK30ndMeN1KsplQZ3xq4mClI4BfaCLWt9B4AJaSvtJWvX-3GXS1NoLLl-UxyHH2rSRGpO4y8WasaCkZqIcUfX7DzhZ6XlnAv29nD_unlqty-Pz5v1tkUhtGsNOAhAsseIHXIVHAGg60HLLjgJzhGaXjsRXcdtNDKBSSZJFUBD3yu5YPzYG-s--kKRSsTRZxz-n8MJsMILDZbzyVkdHYxfPuT8UT0Hf5jEz5P4eRIfykBpEm5PFOQfohZgog</recordid><startdate>20140610</startdate><enddate>20140610</enddate><creator>Ramirez, Alex</creator><creator>Falcon, Ayose J.</creator><creator>Santana, Oliverio J.</creator><creator>Valero, Mateo</creator><general>ACM</general><general>Association for Computing Machinery (ACM)</general><scope>XX2</scope></search><sort><creationdate>20140610</creationdate><title>Author retrospective for software trace cache</title><author>Ramirez, Alex ; Falcon, Ayose J. ; Santana, Oliverio J. ; Valero, Mateo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a2258-6080b0e3daca9a14b8e00a8d0539b83088ea6d582c8917c63f06f6f34b050dd43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Binary trabslation</topic><topic>Compiladors (Programes d'ordinador)</topic><topic>Compilers (Computer programs)</topic><topic>ILP</topic><topic>Informàtica</topic><topic>Instruction fetch</topic><topic>Llenguatges de programació</topic><topic>Programming languages (Electronic computers)</topic><topic>Software and its engineering -- Software notations and tools -- Compilers</topic><topic>Software and its engineering -- Software notations and tools -- Compilers -- Source code generation</topic><topic>Superscalar processors</topic><topic>Àrees temàtiques de la UPC</topic><toplevel>online_resources</toplevel><creatorcontrib>Ramirez, Alex</creatorcontrib><creatorcontrib>Falcon, Ayose J.</creatorcontrib><creatorcontrib>Santana, Oliverio J.</creatorcontrib><creatorcontrib>Valero, Mateo</creatorcontrib><collection>Recercat</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ramirez, Alex</au><au>Falcon, Ayose J.</au><au>Santana, Oliverio J.</au><au>Valero, Mateo</au><au>Banerjee, Utpal</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Author retrospective for software trace cache</atitle><btitle>ACM International Conference on Supercomputing 25th Anniversary Volume</btitle><date>2014-06-10</date><risdate>2014</risdate><spage>45</spage><epage>47</epage><pages>45-47</pages><isbn>1450328407</isbn><isbn>9781450328401</isbn><isbn>9781450326421</isbn><isbn>1450326420</isbn><abstract>In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. Instruction Level Parallelism, represented by wide issue out oforder superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most. Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2591635.2594508</doi><tpages>3</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 1450328407
ispartof	ACM International Conference on Supercomputing 25th Anniversary Volume, 2014, p.45-47
issn
language	eng
recordid	cdi_csuc_recercat_oai_recercat_cat_2072_250711
source	Recercat
subjects	Binary trabslation Compiladors (Programes d'ordinador) Compilers (Computer programs) ILP Informàtica Instruction fetch Llenguatges de programació Programming languages (Electronic computers) Software and its engineering -- Software notations and tools -- Compilers Software and its engineering -- Software notations and tools -- Compilers -- Source code generation Superscalar processors Àrees temàtiques de la UPC
title	Author retrospective for software trace cache
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T11%3A53%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-csuc_XX2&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Author%20retrospective%20for%20software%20trace%20cache&rft.btitle=ACM%20International%20Conference%20on%20Supercomputing%2025th%20Anniversary%20Volume&rft.au=Ramirez,%20Alex&rft.date=2014-06-10&rft.spage=45&rft.epage=47&rft.pages=45-47&rft.isbn=1450328407&rft.isbn_list=9781450328401&rft.isbn_list=9781450326421&rft.isbn_list=1450326420&rft_id=info:doi/10.1145/2591635.2594508&rft_dat=%3Ccsuc_XX2%3Eoai_recercat_cat_2072_250711%3C/csuc_XX2%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true