Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors

Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular, the original perceptron branch predictor suffers...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jimenez, D.A., Loh, G.H.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Accuracy Checkpointing Computer science Delay Educational institutions Energy consumption History Machine learning algorithms Pipeline processing Random access memory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	62
container_issue
container_start_page	55
container_title
container_volume
creator	Jimenez, D.A. Loh, G.H.
description	Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular, the original perceptron branch predictor suffers from a long access latency, and the faster path-based neural predictor (PBNP) requires deep pipelining and additional area to support checkpointing for mis-prediction recovery. The complexity of the PBNP predictor stems from the fact that the path history length, which determines the number of tables and pipeline stages, is equal to the history length, which is typically very long for high accuracy. We propose to decouple the path-history length from the outcome-history length through a new technique called modulo-path history. By allowing a shorter path history, we can implement a PBNP with significantly fewer tables and pipeline stages while still exploiting a traditional long branch outcome history. The pipeline length reduction results in decreased power and implementation complexity. We also propose folded modulo-path history to allow the number of pipeline stages to differ from the path history length. We show that our modulo-path PBNP at 8KB can achieve prediction accuracy and overall performance within 0.8% (SPECint) of the original PBNP while simultaneously reducing predictor energy consumption by ~29% per access and predictor die area by ~35%. Our folded modulo-path history PBNP achieves performance within 1.3% of ideal, with a ~37% energy reduction and ~36% predictor area reduction
doi_str_mv	10.1109/SBAC-PAD.2006.14
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4032416</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4032416</ieee_id><sourcerecordid>4032416</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-62246034bf7ba3b381ef392e792bed00267c6f1daf05de1c360777ea791d98d63</originalsourceid><addsrcrecordid>eNotjMtOwzAURC0eEqV0j8TGP5By_YhdL9PwqlRBJGBdOc5Na5TElWNA_D2RYDaj0cwZQq4ZLBkDc_u6LsqsKu6WHEAtmTwhM66kyAQAOyWXoJXJuQYpzsiM5TlkKhfigizG8QMmCSPByBn5KsOQYug6P-xpOiCtwjdGaoeGFhEtDS19xs9oO7qOdnAHWkVsvEshjrQNcYrWJe-mftMfO-xxSDb5MFA_0Ce_P2QVxmnXT-z0HYPDcZzYK3Le2m7Exb_PyfvD_Vv5lG1fHjdlsc0803nKFOdSgZB1q2srarFi2ArDURteYwPAlXaqZY1tIW-QOaFAa41WG9aYVaPEnNz8_XpE3B2j72382UkQXDIlfgE1g17Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jimenez, D.A. ; Loh, G.H.</creator><creatorcontrib>Jimenez, D.A. ; Loh, G.H.</creatorcontrib><description>Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular, the original perceptron branch predictor suffers from a long access latency, and the faster path-based neural predictor (PBNP) requires deep pipelining and additional area to support checkpointing for mis-prediction recovery. The complexity of the PBNP predictor stems from the fact that the path history length, which determines the number of tables and pipeline stages, is equal to the history length, which is typically very long for high accuracy. We propose to decouple the path-history length from the outcome-history length through a new technique called modulo-path history. By allowing a shorter path history, we can implement a PBNP with significantly fewer tables and pipeline stages while still exploiting a traditional long branch outcome history. The pipeline length reduction results in decreased power and implementation complexity. We also propose folded modulo-path history to allow the number of pipeline stages to differ from the path history length. We show that our modulo-path PBNP at 8KB can achieve prediction accuracy and overall performance within 0.8% (SPECint) of the original PBNP while simultaneously reducing predictor energy consumption by ~29% per access and predictor die area by ~35%. Our folded modulo-path history PBNP achieves performance within 1.3% of ideal, with a ~37% energy reduction and ~36% predictor area reduction</description><identifier>ISSN: 1550-6533</identifier><identifier>ISBN: 0769527043</identifier><identifier>ISBN: 9780769527048</identifier><identifier>EISSN: 2643-3001</identifier><identifier>DOI: 10.1109/SBAC-PAD.2006.14</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Checkpointing ; Computer science ; Delay ; Educational institutions ; Energy consumption ; History ; Machine learning algorithms ; Pipeline processing ; Random access memory</subject><ispartof>2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06), 2006, p.55-62</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4032416$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4032416$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jimenez, D.A.</creatorcontrib><creatorcontrib>Loh, G.H.</creatorcontrib><title>Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors</title><title>2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06)</title><addtitle>SBACPAD</addtitle><description>Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular, the original perceptron branch predictor suffers from a long access latency, and the faster path-based neural predictor (PBNP) requires deep pipelining and additional area to support checkpointing for mis-prediction recovery. The complexity of the PBNP predictor stems from the fact that the path history length, which determines the number of tables and pipeline stages, is equal to the history length, which is typically very long for high accuracy. We propose to decouple the path-history length from the outcome-history length through a new technique called modulo-path history. By allowing a shorter path history, we can implement a PBNP with significantly fewer tables and pipeline stages while still exploiting a traditional long branch outcome history. The pipeline length reduction results in decreased power and implementation complexity. We also propose folded modulo-path history to allow the number of pipeline stages to differ from the path history length. We show that our modulo-path PBNP at 8KB can achieve prediction accuracy and overall performance within 0.8% (SPECint) of the original PBNP while simultaneously reducing predictor energy consumption by ~29% per access and predictor die area by ~35%. Our folded modulo-path history PBNP achieves performance within 1.3% of ideal, with a ~37% energy reduction and ~36% predictor area reduction</description><subject>Accuracy</subject><subject>Checkpointing</subject><subject>Computer science</subject><subject>Delay</subject><subject>Educational institutions</subject><subject>Energy consumption</subject><subject>History</subject><subject>Machine learning algorithms</subject><subject>Pipeline processing</subject><subject>Random access memory</subject><issn>1550-6533</issn><issn>2643-3001</issn><isbn>0769527043</isbn><isbn>9780769527048</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjMtOwzAURC0eEqV0j8TGP5By_YhdL9PwqlRBJGBdOc5Na5TElWNA_D2RYDaj0cwZQq4ZLBkDc_u6LsqsKu6WHEAtmTwhM66kyAQAOyWXoJXJuQYpzsiM5TlkKhfigizG8QMmCSPByBn5KsOQYug6P-xpOiCtwjdGaoeGFhEtDS19xs9oO7qOdnAHWkVsvEshjrQNcYrWJe-mftMfO-xxSDb5MFA_0Ce_P2QVxmnXT-z0HYPDcZzYK3Le2m7Exb_PyfvD_Vv5lG1fHjdlsc0803nKFOdSgZB1q2srarFi2ArDURteYwPAlXaqZY1tIW-QOaFAa41WG9aYVaPEnNz8_XpE3B2j72382UkQXDIlfgE1g17Q</recordid><startdate>200610</startdate><enddate>200610</enddate><creator>Jimenez, D.A.</creator><creator>Loh, G.H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200610</creationdate><title>Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors</title><author>Jimenez, D.A. ; Loh, G.H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-62246034bf7ba3b381ef392e792bed00267c6f1daf05de1c360777ea791d98d63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Accuracy</topic><topic>Checkpointing</topic><topic>Computer science</topic><topic>Delay</topic><topic>Educational institutions</topic><topic>Energy consumption</topic><topic>History</topic><topic>Machine learning algorithms</topic><topic>Pipeline processing</topic><topic>Random access memory</topic><toplevel>online_resources</toplevel><creatorcontrib>Jimenez, D.A.</creatorcontrib><creatorcontrib>Loh, G.H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jimenez, D.A.</au><au>Loh, G.H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors</atitle><btitle>2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06)</btitle><stitle>SBACPAD</stitle><date>2006-10</date><risdate>2006</risdate><spage>55</spage><epage>62</epage><pages>55-62</pages><issn>1550-6533</issn><eissn>2643-3001</eissn><isbn>0769527043</isbn><isbn>9780769527048</isbn><abstract>Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular, the original perceptron branch predictor suffers from a long access latency, and the faster path-based neural predictor (PBNP) requires deep pipelining and additional area to support checkpointing for mis-prediction recovery. The complexity of the PBNP predictor stems from the fact that the path history length, which determines the number of tables and pipeline stages, is equal to the history length, which is typically very long for high accuracy. We propose to decouple the path-history length from the outcome-history length through a new technique called modulo-path history. By allowing a shorter path history, we can implement a PBNP with significantly fewer tables and pipeline stages while still exploiting a traditional long branch outcome history. The pipeline length reduction results in decreased power and implementation complexity. We also propose folded modulo-path history to allow the number of pipeline stages to differ from the path history length. We show that our modulo-path PBNP at 8KB can achieve prediction accuracy and overall performance within 0.8% (SPECint) of the original PBNP while simultaneously reducing predictor energy consumption by ~29% per access and predictor die area by ~35%. Our folded modulo-path history PBNP achieves performance within 1.3% of ideal, with a ~37% energy reduction and ~36% predictor area reduction</abstract><pub>IEEE</pub><doi>10.1109/SBAC-PAD.2006.14</doi><tpages>8</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1550-6533
ispartof	2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06), 2006, p.55-62
issn	1550-6533 2643-3001
language	eng
recordid	cdi_ieee_primary_4032416
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Accuracy Checkpointing Computer science Delay Educational institutions Energy consumption History Machine learning algorithms Pipeline processing Random access memory
title	Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T09%3A26%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Controlling%20the%20Power%20and%20Area%20of%20Neural%20Branch%20Predictors%20for%20Practical%20Implementation%20in%20High-Performance%20Processors&rft.btitle=2006%2018th%20International%20Symposium%20on%20Computer%20Architecture%20and%20High%20Performance%20Computing%20(SBAC-PAD'06)&rft.au=Jimenez,%20D.A.&rft.date=2006-10&rft.spage=55&rft.epage=62&rft.pages=55-62&rft.issn=1550-6533&rft.eissn=2643-3001&rft.isbn=0769527043&rft.isbn_list=9780769527048&rft_id=info:doi/10.1109/SBAC-PAD.2006.14&rft_dat=%3Cieee_6IE%3E4032416%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4032416&rfr_iscdi=true