The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multi-scale computing systems 2018-04, Vol.4 (2), p.99-112
Hauptverfasser: Loi, Igor, Capotondi, Alessandro, Rossi, Davide, Marongiu, Andrea, Benini, Luca
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 112
container_issue 2
container_start_page 99
container_title IEEE transactions on multi-scale computing systems
container_volume 4
creator Loi, Igor
Capotondi, Alessandro
Rossi, Davide
Marongiu, Andrea
Benini, Luca
description High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.
doi_str_mv 10.1109/TMSCS.2017.2769046
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2299130919</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8094020</ieee_id><sourcerecordid>2299130919</sourcerecordid><originalsourceid>FETCH-LOGICAL-c339t-49884b86caa46aee55d66029c0163ead48a6cbcca81f6f601a1bf3725876709e3</originalsourceid><addsrcrecordid>eNpNkMFOAjEQhhujiQR5Ab000evitN1tt0ezopJAxADnppQpLsFdbJcQ3t5FiPE0c_i_mT8fIbcM-oyBfpyNp8W0z4GpPldSQyovSIcLwROlpLz8t1-TXoxrAGASQKisQyazT6QfO4wN9XWggwrD6pAMvC9diVVDhw_0GWO5qmhZ0fmmCTYZ1ftkUu8x0GKziw0GXNKxrQ5JUQeMN-TK203E3nl2yfxlMCvektH767B4GiVOCN0kqc7zdJFLZ20qLWKWLaUErl1bTaBdprmVbuGczZmXXgKzbOGF4lmupAKNokvuT3e3of4-9jfreheq9qXhXGsmQDPdpvgp5UIdY0BvtqH8suFgGJijPPMrzxzlmbO8Fro7QSUi_gE56BQ4iB97O2lY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2299130919</pqid></control><display><type>article</type><title>The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores</title><source>IEEE Electronic Library (IEL)</source><creator>Loi, Igor ; Capotondi, Alessandro ; Rossi, Davide ; Marongiu, Andrea ; Benini, Luca</creator><creatorcontrib>Loi, Igor ; Capotondi, Alessandro ; Rossi, Davide ; Marongiu, Andrea ; Benini, Luca</creatorcontrib><description>High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.</description><identifier>ISSN: 2332-7766</identifier><identifier>EISSN: 2332-7766</identifier><identifier>DOI: 10.1109/TMSCS.2017.2769046</identifier><identifier>CODEN: ITMCFM</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Computer architecture ; Energy efficiency ; Energy management ; Instruction cache ; Internet of Things ; Libraries ; Memory management ; Microcontrollers ; Microprocessors ; near-threshold computing ; parallel architectures ; Power efficiency ; Power management ; Processors ; Program processors ; Programmable controllers ; Random access memory ; Signal processing ; tightly coupled cluster</subject><ispartof>IEEE transactions on multi-scale computing systems, 2018-04, Vol.4 (2), p.99-112</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c339t-49884b86caa46aee55d66029c0163ead48a6cbcca81f6f601a1bf3725876709e3</citedby><cites>FETCH-LOGICAL-c339t-49884b86caa46aee55d66029c0163ead48a6cbcca81f6f601a1bf3725876709e3</cites><orcidid>0000-0003-3852-4662 ; 0000-0003-1010-4762 ; 0000-0001-8705-0761 ; 0000-0002-0651-5393 ; 0000-0001-8068-3806</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8094020$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8094020$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Loi, Igor</creatorcontrib><creatorcontrib>Capotondi, Alessandro</creatorcontrib><creatorcontrib>Rossi, Davide</creatorcontrib><creatorcontrib>Marongiu, Andrea</creatorcontrib><creatorcontrib>Benini, Luca</creatorcontrib><title>The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores</title><title>IEEE transactions on multi-scale computing systems</title><addtitle>TMSCS</addtitle><description>High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.</description><subject>Computer architecture</subject><subject>Energy efficiency</subject><subject>Energy management</subject><subject>Instruction cache</subject><subject>Internet of Things</subject><subject>Libraries</subject><subject>Memory management</subject><subject>Microcontrollers</subject><subject>Microprocessors</subject><subject>near-threshold computing</subject><subject>parallel architectures</subject><subject>Power efficiency</subject><subject>Power management</subject><subject>Processors</subject><subject>Program processors</subject><subject>Programmable controllers</subject><subject>Random access memory</subject><subject>Signal processing</subject><subject>tightly coupled cluster</subject><issn>2332-7766</issn><issn>2332-7766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMFOAjEQhhujiQR5Ab000evitN1tt0ezopJAxADnppQpLsFdbJcQ3t5FiPE0c_i_mT8fIbcM-oyBfpyNp8W0z4GpPldSQyovSIcLwROlpLz8t1-TXoxrAGASQKisQyazT6QfO4wN9XWggwrD6pAMvC9diVVDhw_0GWO5qmhZ0fmmCTYZ1ftkUu8x0GKziw0GXNKxrQ5JUQeMN-TK203E3nl2yfxlMCvektH767B4GiVOCN0kqc7zdJFLZ20qLWKWLaUErl1bTaBdprmVbuGczZmXXgKzbOGF4lmupAKNokvuT3e3of4-9jfreheq9qXhXGsmQDPdpvgp5UIdY0BvtqH8suFgGJijPPMrzxzlmbO8Fro7QSUi_gE56BQ4iB97O2lY</recordid><startdate>20180401</startdate><enddate>20180401</enddate><creator>Loi, Igor</creator><creator>Capotondi, Alessandro</creator><creator>Rossi, Davide</creator><creator>Marongiu, Andrea</creator><creator>Benini, Luca</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3852-4662</orcidid><orcidid>https://orcid.org/0000-0003-1010-4762</orcidid><orcidid>https://orcid.org/0000-0001-8705-0761</orcidid><orcidid>https://orcid.org/0000-0002-0651-5393</orcidid><orcidid>https://orcid.org/0000-0001-8068-3806</orcidid></search><sort><creationdate>20180401</creationdate><title>The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores</title><author>Loi, Igor ; Capotondi, Alessandro ; Rossi, Davide ; Marongiu, Andrea ; Benini, Luca</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c339t-49884b86caa46aee55d66029c0163ead48a6cbcca81f6f601a1bf3725876709e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer architecture</topic><topic>Energy efficiency</topic><topic>Energy management</topic><topic>Instruction cache</topic><topic>Internet of Things</topic><topic>Libraries</topic><topic>Memory management</topic><topic>Microcontrollers</topic><topic>Microprocessors</topic><topic>near-threshold computing</topic><topic>parallel architectures</topic><topic>Power efficiency</topic><topic>Power management</topic><topic>Processors</topic><topic>Program processors</topic><topic>Programmable controllers</topic><topic>Random access memory</topic><topic>Signal processing</topic><topic>tightly coupled cluster</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Loi, Igor</creatorcontrib><creatorcontrib>Capotondi, Alessandro</creatorcontrib><creatorcontrib>Rossi, Davide</creatorcontrib><creatorcontrib>Marongiu, Andrea</creatorcontrib><creatorcontrib>Benini, Luca</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multi-scale computing systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Loi, Igor</au><au>Capotondi, Alessandro</au><au>Rossi, Davide</au><au>Marongiu, Andrea</au><au>Benini, Luca</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores</atitle><jtitle>IEEE transactions on multi-scale computing systems</jtitle><stitle>TMSCS</stitle><date>2018-04-01</date><risdate>2018</risdate><volume>4</volume><issue>2</issue><spage>99</spage><epage>112</epage><pages>99-112</pages><issn>2332-7766</issn><eissn>2332-7766</eissn><coden>ITMCFM</coden><abstract>High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications. While traditional Ultra-Low-Power designs rely on single-core micro-controllers (MCU), a new generation of architectures leveraging fully programmable tightly-coupled clusters of near-threshold processors is emerging, joining the performance gain of parallel execution over multiple cores with the energy efficiency of low-voltage operation. In this work, we tackle one of the most critical energy-efficiency bottlenecks for these architectures: instruction memory hierarchy. Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging a crossbar between processors and single-port banks (SP), and one leveraging banks with multiple read ports (MP). We evaluate the proposed architectures on a set of signal processing applications with different executable sizes and working-sets. The results show that the shared cache architectures are able to efficiently execute a much wider set of applications (including those featuring large memory footprint and irregular access patterns) with a much smaller area and with much better energy efficiency with respect to the private cache. The multi-port cache is suitable for sizes up to a few kB, improving performance by up to 40 percent, energy efficiency by up to 20 percent, and energy × area efficiency by up to 30 percent with respect to the private cache. The single-port solution is more suitable for larger cache sizes (up to 16 kB), providing up to 20 percent better energy x area efficiency than the multi-port, and up to 30 percent better energy efficiency than private cache.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMSCS.2017.2769046</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3852-4662</orcidid><orcidid>https://orcid.org/0000-0003-1010-4762</orcidid><orcidid>https://orcid.org/0000-0001-8705-0761</orcidid><orcidid>https://orcid.org/0000-0002-0651-5393</orcidid><orcidid>https://orcid.org/0000-0001-8068-3806</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2332-7766
ispartof IEEE transactions on multi-scale computing systems, 2018-04, Vol.4 (2), p.99-112
issn 2332-7766
2332-7766
language eng
recordid cdi_proquest_journals_2299130919
source IEEE Electronic Library (IEL)
subjects Computer architecture
Energy efficiency
Energy management
Instruction cache
Internet of Things
Libraries
Memory management
Microcontrollers
Microprocessors
near-threshold computing
parallel architectures
Power efficiency
Power management
Processors
Program processors
Programmable controllers
Random access memory
Signal processing
tightly coupled cluster
title The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T05%3A39%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Quest%20for%20Energy-Efficient%20I$%20Design%20in%20Ultra-Low-Power%20Clustered%20Many-Cores&rft.jtitle=IEEE%20transactions%20on%20multi-scale%20computing%20systems&rft.au=Loi,%20Igor&rft.date=2018-04-01&rft.volume=4&rft.issue=2&rft.spage=99&rft.epage=112&rft.pages=99-112&rft.issn=2332-7766&rft.eissn=2332-7766&rft.coden=ITMCFM&rft_id=info:doi/10.1109/TMSCS.2017.2769046&rft_dat=%3Cproquest_RIE%3E2299130919%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2299130919&rft_id=info:pmid/&rft_ieee_id=8094020&rfr_iscdi=true