Infinity Stream: Enabling Transparent and Automated In-Memory Computing

Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transpo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE computer architecture letters 2022-07, Vol.21 (2), p.85-88
Hauptverfasser: Wang, Zhengrong, Liu, Christopher, Nowatzki, Tony
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 88
container_issue 2
container_start_page 85
container_title IEEE computer architecture letters
container_volume 21
creator Wang, Zhengrong
Liu, Christopher
Nowatzki, Tony
description Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transposed layout required for bit-serial logic; Mixing in/near-memory computing. These challenges should be solved transparently to maintain portability without exposing hardware details to programmers. In this work, we introduce a novel intermediate representation - tensor dataflow graph (tDFG) - with tensor nodes representing the spatially unrolled data across bitlines, and explicit move nodes to align operands in the same bitline, which helps the compiler optimize for massive parallelism and data layout. To maintain transparency and portability, we directly embed tDFG in the ISA, which is lowered into bit-serial operations at runtime to hide the hardware details. Evaluated on cycle-accurate simulator across various data-processing workloads, our approach achieves 4.5× speedup and 52% traffic reduction over a state-of-the-art near-memory computing technique.
doi_str_mv 10.1109/LCA.2022.3203064
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2711055765</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9872048</ieee_id><sourcerecordid>2711055765</sourcerecordid><originalsourceid>FETCH-LOGICAL-c221t-6835e103658530d8db460bd37e92435a29a17c0c00bb992077a50da2237417f13</originalsourceid><addsrcrecordid>eNo9kM9LwzAYhoMoOKd3wUvAc-eX3623UXQOJh6c55C2qXSsaU3Sw_57MzZ2-t7D874fPAg9ElgQAsXLplwuKFC6YBQYSH6FZkQImcmUry9ZyFt0F8IOgEuW8xlarV3buS4e8Hf01vSv-M2Zat-5X7z1xoXReOsiNq7ByykOvYm2wWuXfdp-8AdcDv04xUTfo5vW7IN9ON85-nl_25Yf2eZrtS6Xm6ymlMRM5kxYAkyKXDBo8qbiEqqGKVtQzoShhSGqhhqgqoqCglJGQGMoZYoT1RI2R8-n3dEPf5MNUe-Gybv0UlOVRAihpEgUnKjaDyF42-rRd73xB01AH3XppEsfdemzrlR5OlU6a-0FL3JFgefsH8WUY7E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711055765</pqid></control><display><type>article</type><title>Infinity Stream: Enabling Transparent and Automated In-Memory Computing</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Zhengrong ; Liu, Christopher ; Nowatzki, Tony</creator><creatorcontrib>Wang, Zhengrong ; Liu, Christopher ; Nowatzki, Tony</creatorcontrib><description>Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transposed layout required for bit-serial logic; Mixing in/near-memory computing. These challenges should be solved transparently to maintain portability without exposing hardware details to programmers. In this work, we introduce a novel intermediate representation - tensor dataflow graph (tDFG) - with tensor nodes representing the spatially unrolled data across bitlines, and explicit move nodes to align operands in the same bitline, which helps the compiler optimize for massive parallelism and data layout. To maintain transparency and portability, we directly embed tDFG in the ISA, which is lowered into bit-serial operations at runtime to hide the hardware details. Evaluated on cycle-accurate simulator across various data-processing workloads, our approach achieves 4.5× speedup and 52% traffic reduction over a state-of-the-art near-memory computing technique.</description><identifier>ISSN: 1556-6056</identifier><identifier>EISSN: 1556-6064</identifier><identifier>DOI: 10.1109/LCA.2022.3203064</identifier><identifier>CODEN: ICALC3</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Arrays ; Computation ; Computer memory ; Data processing ; Graphical representations ; Hardware ; In-memory computing ; Layout ; Layouts ; Mathematical analysis ; Nodes ; Parallel processing ; Portability ; programmer-transparent acceleration ; Random access memory ; Stream-based ISAs ; Tensors ; Traffic speed</subject><ispartof>IEEE computer architecture letters, 2022-07, Vol.21 (2), p.85-88</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c221t-6835e103658530d8db460bd37e92435a29a17c0c00bb992077a50da2237417f13</citedby><cites>FETCH-LOGICAL-c221t-6835e103658530d8db460bd37e92435a29a17c0c00bb992077a50da2237417f13</cites><orcidid>0000-0002-0917-6358 ; 0000-0003-2366-4267 ; 0000-0001-8483-3824</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9872048$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9872048$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Zhengrong</creatorcontrib><creatorcontrib>Liu, Christopher</creatorcontrib><creatorcontrib>Nowatzki, Tony</creatorcontrib><title>Infinity Stream: Enabling Transparent and Automated In-Memory Computing</title><title>IEEE computer architecture letters</title><addtitle>LCA</addtitle><description>Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transposed layout required for bit-serial logic; Mixing in/near-memory computing. These challenges should be solved transparently to maintain portability without exposing hardware details to programmers. In this work, we introduce a novel intermediate representation - tensor dataflow graph (tDFG) - with tensor nodes representing the spatially unrolled data across bitlines, and explicit move nodes to align operands in the same bitline, which helps the compiler optimize for massive parallelism and data layout. To maintain transparency and portability, we directly embed tDFG in the ISA, which is lowered into bit-serial operations at runtime to hide the hardware details. Evaluated on cycle-accurate simulator across various data-processing workloads, our approach achieves 4.5× speedup and 52% traffic reduction over a state-of-the-art near-memory computing technique.</description><subject>Arrays</subject><subject>Computation</subject><subject>Computer memory</subject><subject>Data processing</subject><subject>Graphical representations</subject><subject>Hardware</subject><subject>In-memory computing</subject><subject>Layout</subject><subject>Layouts</subject><subject>Mathematical analysis</subject><subject>Nodes</subject><subject>Parallel processing</subject><subject>Portability</subject><subject>programmer-transparent acceleration</subject><subject>Random access memory</subject><subject>Stream-based ISAs</subject><subject>Tensors</subject><subject>Traffic speed</subject><issn>1556-6056</issn><issn>1556-6064</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM9LwzAYhoMoOKd3wUvAc-eX3623UXQOJh6c55C2qXSsaU3Sw_57MzZ2-t7D874fPAg9ElgQAsXLplwuKFC6YBQYSH6FZkQImcmUry9ZyFt0F8IOgEuW8xlarV3buS4e8Hf01vSv-M2Zat-5X7z1xoXReOsiNq7ByykOvYm2wWuXfdp-8AdcDv04xUTfo5vW7IN9ON85-nl_25Yf2eZrtS6Xm6ymlMRM5kxYAkyKXDBo8qbiEqqGKVtQzoShhSGqhhqgqoqCglJGQGMoZYoT1RI2R8-n3dEPf5MNUe-Gybv0UlOVRAihpEgUnKjaDyF42-rRd73xB01AH3XppEsfdemzrlR5OlU6a-0FL3JFgefsH8WUY7E</recordid><startdate>20220701</startdate><enddate>20220701</enddate><creator>Wang, Zhengrong</creator><creator>Liu, Christopher</creator><creator>Nowatzki, Tony</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0917-6358</orcidid><orcidid>https://orcid.org/0000-0003-2366-4267</orcidid><orcidid>https://orcid.org/0000-0001-8483-3824</orcidid></search><sort><creationdate>20220701</creationdate><title>Infinity Stream: Enabling Transparent and Automated In-Memory Computing</title><author>Wang, Zhengrong ; Liu, Christopher ; Nowatzki, Tony</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c221t-6835e103658530d8db460bd37e92435a29a17c0c00bb992077a50da2237417f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Arrays</topic><topic>Computation</topic><topic>Computer memory</topic><topic>Data processing</topic><topic>Graphical representations</topic><topic>Hardware</topic><topic>In-memory computing</topic><topic>Layout</topic><topic>Layouts</topic><topic>Mathematical analysis</topic><topic>Nodes</topic><topic>Parallel processing</topic><topic>Portability</topic><topic>programmer-transparent acceleration</topic><topic>Random access memory</topic><topic>Stream-based ISAs</topic><topic>Tensors</topic><topic>Traffic speed</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Zhengrong</creatorcontrib><creatorcontrib>Liu, Christopher</creatorcontrib><creatorcontrib>Nowatzki, Tony</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE computer architecture letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Zhengrong</au><au>Liu, Christopher</au><au>Nowatzki, Tony</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Infinity Stream: Enabling Transparent and Automated In-Memory Computing</atitle><jtitle>IEEE computer architecture letters</jtitle><stitle>LCA</stitle><date>2022-07-01</date><risdate>2022</risdate><volume>21</volume><issue>2</issue><spage>85</spage><epage>88</epage><pages>85-88</pages><issn>1556-6056</issn><eissn>1556-6064</eissn><coden>ICALC3</coden><abstract>Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transposed layout required for bit-serial logic; Mixing in/near-memory computing. These challenges should be solved transparently to maintain portability without exposing hardware details to programmers. In this work, we introduce a novel intermediate representation - tensor dataflow graph (tDFG) - with tensor nodes representing the spatially unrolled data across bitlines, and explicit move nodes to align operands in the same bitline, which helps the compiler optimize for massive parallelism and data layout. To maintain transparency and portability, we directly embed tDFG in the ISA, which is lowered into bit-serial operations at runtime to hide the hardware details. Evaluated on cycle-accurate simulator across various data-processing workloads, our approach achieves 4.5× speedup and 52% traffic reduction over a state-of-the-art near-memory computing technique.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LCA.2022.3203064</doi><tpages>4</tpages><orcidid>https://orcid.org/0000-0002-0917-6358</orcidid><orcidid>https://orcid.org/0000-0003-2366-4267</orcidid><orcidid>https://orcid.org/0000-0001-8483-3824</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1556-6056
ispartof IEEE computer architecture letters, 2022-07, Vol.21 (2), p.85-88
issn 1556-6056
1556-6064
language eng
recordid cdi_proquest_journals_2711055765
source IEEE Electronic Library (IEL)
subjects Arrays
Computation
Computer memory
Data processing
Graphical representations
Hardware
In-memory computing
Layout
Layouts
Mathematical analysis
Nodes
Parallel processing
Portability
programmer-transparent acceleration
Random access memory
Stream-based ISAs
Tensors
Traffic speed
title Infinity Stream: Enabling Transparent and Automated In-Memory Computing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T08%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Infinity%20Stream:%20Enabling%20Transparent%20and%20Automated%20In-Memory%20Computing&rft.jtitle=IEEE%20computer%20architecture%20letters&rft.au=Wang,%20Zhengrong&rft.date=2022-07-01&rft.volume=21&rft.issue=2&rft.spage=85&rft.epage=88&rft.pages=85-88&rft.issn=1556-6056&rft.eissn=1556-6064&rft.coden=ICALC3&rft_id=info:doi/10.1109/LCA.2022.3203064&rft_dat=%3Cproquest_RIE%3E2711055765%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2711055765&rft_id=info:pmid/&rft_ieee_id=9872048&rfr_iscdi=true