CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference

Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation value...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE MICRO 2019-09, Vol.39 (5), p.36-45
Hauptverfasser: Riera, Marc, Arnau, Jose-Maria, Gonzalez, Antonio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 45
container_issue 5
container_start_page 36
container_title IEEE MICRO
container_volume 39
creator Riera, Marc
Arnau, Jose-Maria
Gonzalez, Antonio
description Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.
doi_str_mv 10.1109/MM.2019.2929742
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_364080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8771118</ieee_id><sourcerecordid>2289260292</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-9a1aa09b74bcdaa16dcfe08b878a632cc4e7a809ccf8306caba9a25a01ec9ec13</originalsourceid><addsrcrecordid>eNpFkMFLKzEQh4MoWH2ePXhZ8Lx1kqybxFsptQq2T0TPYTqdSESzmmwf-N-7pcV3GIaB3_dj-IQ4lzCWEtzVYjFWIN1YOeVMow7ESDpt6kY2-lCMQBlVS6PVsTgp5Q0ArhXYkVhO54-Tm2raYS5czzPGxOvqMW9STK9VF6oJ9fEf9rFLpQpdrmaJ8-t3PQshUuTUV0_LZXWfAmdOxH_EUcD3wmf7fSpebmfP07v64e_8fjp5qEkb1dcOJSK4lWlWtEaU7ZoCg11ZY7HViqhhgxYcUbAaWsIVOlTXCJLJMUl9KuSul8qGfGbiTNj7DuP_YzsKjPK6bcDCwFzumM_cfW249P6t2-Q0vOmVsk61MKgbUlf75tyVkjn4zxw_MH97CX4r2i8Wfiva70UPxMWOiMz8m7bGSCmt_gFRM3h5</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2289260292</pqid></control><display><type>article</type><title>CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference</title><source>IEEE Electronic Library (IEL)</source><creator>Riera, Marc ; Arnau, Jose-Maria ; Gonzalez, Antonio</creator><creatorcontrib>Riera, Marc ; Arnau, Jose-Maria ; Gonzalez, Antonio</creatorcontrib><description>Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.</description><identifier>ISSN: 0272-1732</identifier><identifier>EISSN: 1937-4143</identifier><identifier>DOI: 10.1109/MM.2019.2929742</identifier><identifier>CODEN: IEMIDZ</identifier><language>eng</language><publisher>Los Alamitos: IEEE</publisher><subject>Accelerators ; Aprenentatge automàtic ; Deep learning ; Energy efficiency ; Histograms ; Informàtica ; Intel·ligència artificial ; Logic gates ; Low Energy ; Machine learning ; Mathematical model ; Neurons ; Pruning ; Recurrent neural networks ; RNN ; Àrees temàtiques de la UPC</subject><ispartof>IEEE MICRO, 2019-09, Vol.39 (5), p.36-45</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><rights>info:eu-repo/semantics/openAccess</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-9a1aa09b74bcdaa16dcfe08b878a632cc4e7a809ccf8306caba9a25a01ec9ec13</citedby><cites>FETCH-LOGICAL-c372t-9a1aa09b74bcdaa16dcfe08b878a632cc4e7a809ccf8306caba9a25a01ec9ec13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8771118$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,776,780,792,881,26951,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8771118$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Riera, Marc</creatorcontrib><creatorcontrib>Arnau, Jose-Maria</creatorcontrib><creatorcontrib>Gonzalez, Antonio</creatorcontrib><title>CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference</title><title>IEEE MICRO</title><addtitle>MM</addtitle><description>Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.</description><subject>Accelerators</subject><subject>Aprenentatge automàtic</subject><subject>Deep learning</subject><subject>Energy efficiency</subject><subject>Histograms</subject><subject>Informàtica</subject><subject>Intel·ligència artificial</subject><subject>Logic gates</subject><subject>Low Energy</subject><subject>Machine learning</subject><subject>Mathematical model</subject><subject>Neurons</subject><subject>Pruning</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Àrees temàtiques de la UPC</subject><issn>0272-1732</issn><issn>1937-4143</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>XX2</sourceid><recordid>eNpFkMFLKzEQh4MoWH2ePXhZ8Lx1kqybxFsptQq2T0TPYTqdSESzmmwf-N-7pcV3GIaB3_dj-IQ4lzCWEtzVYjFWIN1YOeVMow7ESDpt6kY2-lCMQBlVS6PVsTgp5Q0ArhXYkVhO54-Tm2raYS5czzPGxOvqMW9STK9VF6oJ9fEf9rFLpQpdrmaJ8-t3PQshUuTUV0_LZXWfAmdOxH_EUcD3wmf7fSpebmfP07v64e_8fjp5qEkb1dcOJSK4lWlWtEaU7ZoCg11ZY7HViqhhgxYcUbAaWsIVOlTXCJLJMUl9KuSul8qGfGbiTNj7DuP_YzsKjPK6bcDCwFzumM_cfW249P6t2-Q0vOmVsk61MKgbUlf75tyVkjn4zxw_MH97CX4r2i8Wfiva70UPxMWOiMz8m7bGSCmt_gFRM3h5</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Riera, Marc</creator><creator>Arnau, Jose-Maria</creator><creator>Gonzalez, Antonio</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>XX2</scope></search><sort><creationdate>20190901</creationdate><title>CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference</title><author>Riera, Marc ; Arnau, Jose-Maria ; Gonzalez, Antonio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-9a1aa09b74bcdaa16dcfe08b878a632cc4e7a809ccf8306caba9a25a01ec9ec13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accelerators</topic><topic>Aprenentatge automàtic</topic><topic>Deep learning</topic><topic>Energy efficiency</topic><topic>Histograms</topic><topic>Informàtica</topic><topic>Intel·ligència artificial</topic><topic>Logic gates</topic><topic>Low Energy</topic><topic>Machine learning</topic><topic>Mathematical model</topic><topic>Neurons</topic><topic>Pruning</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Àrees temàtiques de la UPC</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Riera, Marc</creatorcontrib><creatorcontrib>Arnau, Jose-Maria</creatorcontrib><creatorcontrib>Gonzalez, Antonio</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Recercat</collection><jtitle>IEEE MICRO</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Riera, Marc</au><au>Arnau, Jose-Maria</au><au>Gonzalez, Antonio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference</atitle><jtitle>IEEE MICRO</jtitle><stitle>MM</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>39</volume><issue>5</issue><spage>36</spage><epage>45</epage><pages>36-45</pages><issn>0272-1732</issn><eissn>1937-4143</eissn><coden>IEMIDZ</coden><abstract>Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.</abstract><cop>Los Alamitos</cop><pub>IEEE</pub><doi>10.1109/MM.2019.2929742</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0272-1732
ispartof IEEE MICRO, 2019-09, Vol.39 (5), p.36-45
issn 0272-1732
1937-4143
language eng
recordid cdi_csuc_recercat_oai_recercat_cat_2072_364080
source IEEE Electronic Library (IEL)
subjects Accelerators
Aprenentatge automàtic
Deep learning
Energy efficiency
Histograms
Informàtica
Intel·ligència artificial
Logic gates
Low Energy
Machine learning
Mathematical model
Neurons
Pruning
Recurrent neural networks
RNN
Àrees temàtiques de la UPC
title CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T21%3A49%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CGPA:%20Coarse-Grained%20Pruning%20of%20Activations%20for%20Energy-Efficient%20RNN%20Inference&rft.jtitle=IEEE%20MICRO&rft.au=Riera,%20Marc&rft.date=2019-09-01&rft.volume=39&rft.issue=5&rft.spage=36&rft.epage=45&rft.pages=36-45&rft.issn=0272-1732&rft.eissn=1937-4143&rft.coden=IEMIDZ&rft_id=info:doi/10.1109/MM.2019.2929742&rft_dat=%3Cproquest_RIE%3E2289260292%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2289260292&rft_id=info:pmid/&rft_ieee_id=8771118&rfr_iscdi=true