DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution

Despite the extraordinary applicative potentiality that dynamic graph inference may entail, its practical-physical implementation has been a topic seldom explored in literature. Although graph inference through neural networks has received plenty of algorithmic innovation, its transfer to the physic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on design automation of electronic systems 2023-01, Vol.28 (1), p.1-27, Article 1
Hauptverfasser:	Romero Hung, José, Li, Chao, Wang, Taolei, Guo, Jinyang, Wang, Pengyu, Shao, Chuanming, Wang, Jing, Shi, Guoyong, Liu, Xiangwen, Wu, Hanqing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer systems organization Computing methodologies Embedded hardware Hardware Hardware accelerators Neural networks Real-time system architecture
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	27
container_issue	1
container_start_page	1
container_title	ACM transactions on design automation of electronic systems
container_volume	28
creator	Romero Hung, José Li, Chao Wang, Taolei Guo, Jinyang Wang, Pengyu Shao, Chuanming Wang, Jing Shi, Guoyong Liu, Xiangwen Wu, Hanqing
description	Despite the extraordinary applicative potentiality that dynamic graph inference may entail, its practical-physical implementation has been a topic seldom explored in literature. Although graph inference through neural networks has received plenty of algorithmic innovation, its transfer to the physical world has not found similar development. This is understandable since the most preeminent Euclidean acceleration techniques from CNN have little implication in the non-Euclidean nature of relational graphs. Instead of coping with the challenges arising from forcing naturally sparse structures into more inflexible stochastic arrangements, in DRAGON, we embrace this characteristic in order to promote acceleration. Inspired by high-performance computing approaches like Parallel Multi-moth Flame Optimization for Link Prediction (PMFO-LP), we propose and implement a novel efficient architecture, capable of producing similar speed-up and performance than baseline but at a fraction of its hardware requirements and power consumption. We leverage the hidden parallelistic capacity of our previously developed static graph convolutional processor ACE-GCN and expanded it with RNN structures, allowing the deployment of a multi-processing network referenced around a common pool of proximity-based centroids. Experimental results demonstrate outstanding acceleration. In comparison with the fastest CPU-based software implementation available in the literature, DRAGON has achieved roughly 191× speed-up. Under the largest configuration and dataset, DRAGON was also able to overtake a more power-hungry PMFO-LP by almost 1.59× in speed, and at around 89.59% in power efficiency. More importantly than raw acceleration, we demonstrate the unique functional qualities of our approach as a flexible and fault-tolerant solution that makes it an interesting alternative for an anthology of applicative scenarios.
doi_str_mv	10.1145/3524124
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3524124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3524124</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-62ec40119548102a9b9ecf45c0b08f8949837c36c54a6c8aeff389b5edec0f133</originalsourceid><addsrcrecordid>eNo9j0FLw0AUhBdRsFbx7ik3T9G32d1k11tMtQrFQNFz2DzfYiTZlE0q9N830uphmIH5GBjGrjnccS7VvVCJ5Ik8YTOuVBZnAszplEHLWE75nF0MwzcAqCxVM_a4WOfL8u0hWuy87RqM1oTbEMiPUY5ILQU79iFyk5bBbr6i0reNp6jo_U_fbsem95fszNl2oKujz9nH89N78RKvyuVrka9im0g5xmlCKIFzo6TmkFhTG0InFUIN2mkjjRYZihSVtClqS84JbWpFn4TguBBzdnvYxdAPQyBXbULT2bCrOFS_16vj9Ym8OZAWu3_or9wDynpSMw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution</title><source>ACM Digital Library</source><creator>Romero Hung, José ; Li, Chao ; Wang, Taolei ; Guo, Jinyang ; Wang, Pengyu ; Shao, Chuanming ; Wang, Jing ; Shi, Guoyong ; Liu, Xiangwen ; Wu, Hanqing</creator><creatorcontrib>Romero Hung, José ; Li, Chao ; Wang, Taolei ; Guo, Jinyang ; Wang, Pengyu ; Shao, Chuanming ; Wang, Jing ; Shi, Guoyong ; Liu, Xiangwen ; Wu, Hanqing</creatorcontrib><description>Despite the extraordinary applicative potentiality that dynamic graph inference may entail, its practical-physical implementation has been a topic seldom explored in literature. Although graph inference through neural networks has received plenty of algorithmic innovation, its transfer to the physical world has not found similar development. This is understandable since the most preeminent Euclidean acceleration techniques from CNN have little implication in the non-Euclidean nature of relational graphs. Instead of coping with the challenges arising from forcing naturally sparse structures into more inflexible stochastic arrangements, in DRAGON, we embrace this characteristic in order to promote acceleration. Inspired by high-performance computing approaches like Parallel Multi-moth Flame Optimization for Link Prediction (PMFO-LP), we propose and implement a novel efficient architecture, capable of producing similar speed-up and performance than baseline but at a fraction of its hardware requirements and power consumption. We leverage the hidden parallelistic capacity of our previously developed static graph convolutional processor ACE-GCN and expanded it with RNN structures, allowing the deployment of a multi-processing network referenced around a common pool of proximity-based centroids. Experimental results demonstrate outstanding acceleration. In comparison with the fastest CPU-based software implementation available in the literature, DRAGON has achieved roughly 191× speed-up. Under the largest configuration and dataset, DRAGON was also able to overtake a more power-hungry PMFO-LP by almost 1.59× in speed, and at around 89.59% in power efficiency. More importantly than raw acceleration, we demonstrate the unique functional qualities of our approach as a flexible and fault-tolerant solution that makes it an interesting alternative for an anthology of applicative scenarios.</description><identifier>ISSN: 1084-4309</identifier><identifier>EISSN: 1557-7309</identifier><identifier>DOI: 10.1145/3524124</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Computing methodologies ; Embedded hardware ; Hardware ; Hardware accelerators ; Neural networks ; Real-time system architecture</subject><ispartof>ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-27, Article 1</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-62ec40119548102a9b9ecf45c0b08f8949837c36c54a6c8aeff389b5edec0f133</citedby><cites>FETCH-LOGICAL-a244t-62ec40119548102a9b9ecf45c0b08f8949837c36c54a6c8aeff389b5edec0f133</cites><orcidid>0000-0003-4405-0424 ; 0000-0001-7260-0521 ; 0000-0003-4141-5846 ; 0000-0002-0166-1447 ; 0000-0002-8655-3487 ; 0000-0002-3704-1530 ; 0000-0002-9046-2780 ; 0000-0001-6218-4659 ; 0000-0002-0223-9803 ; 0000-0001-5674-2482</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3524124$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75970</link.rule.ids></links><search><creatorcontrib>Romero Hung, José</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Wang, Taolei</creatorcontrib><creatorcontrib>Guo, Jinyang</creatorcontrib><creatorcontrib>Wang, Pengyu</creatorcontrib><creatorcontrib>Shao, Chuanming</creatorcontrib><creatorcontrib>Wang, Jing</creatorcontrib><creatorcontrib>Shi, Guoyong</creatorcontrib><creatorcontrib>Liu, Xiangwen</creatorcontrib><creatorcontrib>Wu, Hanqing</creatorcontrib><title>DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution</title><title>ACM transactions on design automation of electronic systems</title><addtitle>ACM TODAES</addtitle><description>Despite the extraordinary applicative potentiality that dynamic graph inference may entail, its practical-physical implementation has been a topic seldom explored in literature. Although graph inference through neural networks has received plenty of algorithmic innovation, its transfer to the physical world has not found similar development. This is understandable since the most preeminent Euclidean acceleration techniques from CNN have little implication in the non-Euclidean nature of relational graphs. Instead of coping with the challenges arising from forcing naturally sparse structures into more inflexible stochastic arrangements, in DRAGON, we embrace this characteristic in order to promote acceleration. Inspired by high-performance computing approaches like Parallel Multi-moth Flame Optimization for Link Prediction (PMFO-LP), we propose and implement a novel efficient architecture, capable of producing similar speed-up and performance than baseline but at a fraction of its hardware requirements and power consumption. We leverage the hidden parallelistic capacity of our previously developed static graph convolutional processor ACE-GCN and expanded it with RNN structures, allowing the deployment of a multi-processing network referenced around a common pool of proximity-based centroids. Experimental results demonstrate outstanding acceleration. In comparison with the fastest CPU-based software implementation available in the literature, DRAGON has achieved roughly 191× speed-up. Under the largest configuration and dataset, DRAGON was also able to overtake a more power-hungry PMFO-LP by almost 1.59× in speed, and at around 89.59% in power efficiency. More importantly than raw acceleration, we demonstrate the unique functional qualities of our approach as a flexible and fault-tolerant solution that makes it an interesting alternative for an anthology of applicative scenarios.</description><subject>Computer systems organization</subject><subject>Computing methodologies</subject><subject>Embedded hardware</subject><subject>Hardware</subject><subject>Hardware accelerators</subject><subject>Neural networks</subject><subject>Real-time system architecture</subject><issn>1084-4309</issn><issn>1557-7309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9j0FLw0AUhBdRsFbx7ik3T9G32d1k11tMtQrFQNFz2DzfYiTZlE0q9N830uphmIH5GBjGrjnccS7VvVCJ5Ik8YTOuVBZnAszplEHLWE75nF0MwzcAqCxVM_a4WOfL8u0hWuy87RqM1oTbEMiPUY5ILQU79iFyk5bBbr6i0reNp6jo_U_fbsem95fszNl2oKujz9nH89N78RKvyuVrka9im0g5xmlCKIFzo6TmkFhTG0InFUIN2mkjjRYZihSVtClqS84JbWpFn4TguBBzdnvYxdAPQyBXbULT2bCrOFS_16vj9Ym8OZAWu3_or9wDynpSMw</recordid><startdate>20230120</startdate><enddate>20230120</enddate><creator>Romero Hung, José</creator><creator>Li, Chao</creator><creator>Wang, Taolei</creator><creator>Guo, Jinyang</creator><creator>Wang, Pengyu</creator><creator>Shao, Chuanming</creator><creator>Wang, Jing</creator><creator>Shi, Guoyong</creator><creator>Liu, Xiangwen</creator><creator>Wu, Hanqing</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-4405-0424</orcidid><orcidid>https://orcid.org/0000-0001-7260-0521</orcidid><orcidid>https://orcid.org/0000-0003-4141-5846</orcidid><orcidid>https://orcid.org/0000-0002-0166-1447</orcidid><orcidid>https://orcid.org/0000-0002-8655-3487</orcidid><orcidid>https://orcid.org/0000-0002-3704-1530</orcidid><orcidid>https://orcid.org/0000-0002-9046-2780</orcidid><orcidid>https://orcid.org/0000-0001-6218-4659</orcidid><orcidid>https://orcid.org/0000-0002-0223-9803</orcidid><orcidid>https://orcid.org/0000-0001-5674-2482</orcidid></search><sort><creationdate>20230120</creationdate><title>DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution</title><author>Romero Hung, José ; Li, Chao ; Wang, Taolei ; Guo, Jinyang ; Wang, Pengyu ; Shao, Chuanming ; Wang, Jing ; Shi, Guoyong ; Liu, Xiangwen ; Wu, Hanqing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-62ec40119548102a9b9ecf45c0b08f8949837c36c54a6c8aeff389b5edec0f133</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Computing methodologies</topic><topic>Embedded hardware</topic><topic>Hardware</topic><topic>Hardware accelerators</topic><topic>Neural networks</topic><topic>Real-time system architecture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Romero Hung, José</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Wang, Taolei</creatorcontrib><creatorcontrib>Guo, Jinyang</creatorcontrib><creatorcontrib>Wang, Pengyu</creatorcontrib><creatorcontrib>Shao, Chuanming</creatorcontrib><creatorcontrib>Wang, Jing</creatorcontrib><creatorcontrib>Shi, Guoyong</creatorcontrib><creatorcontrib>Liu, Xiangwen</creatorcontrib><creatorcontrib>Wu, Hanqing</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on design automation of electronic systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Romero Hung, José</au><au>Li, Chao</au><au>Wang, Taolei</au><au>Guo, Jinyang</au><au>Wang, Pengyu</au><au>Shao, Chuanming</au><au>Wang, Jing</au><au>Shi, Guoyong</au><au>Liu, Xiangwen</au><au>Wu, Hanqing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution</atitle><jtitle>ACM transactions on design automation of electronic systems</jtitle><stitle>ACM TODAES</stitle><date>2023-01-20</date><risdate>2023</risdate><volume>28</volume><issue>1</issue><spage>1</spage><epage>27</epage><pages>1-27</pages><artnum>1</artnum><issn>1084-4309</issn><eissn>1557-7309</eissn><abstract>Despite the extraordinary applicative potentiality that dynamic graph inference may entail, its practical-physical implementation has been a topic seldom explored in literature. Although graph inference through neural networks has received plenty of algorithmic innovation, its transfer to the physical world has not found similar development. This is understandable since the most preeminent Euclidean acceleration techniques from CNN have little implication in the non-Euclidean nature of relational graphs. Instead of coping with the challenges arising from forcing naturally sparse structures into more inflexible stochastic arrangements, in DRAGON, we embrace this characteristic in order to promote acceleration. Inspired by high-performance computing approaches like Parallel Multi-moth Flame Optimization for Link Prediction (PMFO-LP), we propose and implement a novel efficient architecture, capable of producing similar speed-up and performance than baseline but at a fraction of its hardware requirements and power consumption. We leverage the hidden parallelistic capacity of our previously developed static graph convolutional processor ACE-GCN and expanded it with RNN structures, allowing the deployment of a multi-processing network referenced around a common pool of proximity-based centroids. Experimental results demonstrate outstanding acceleration. In comparison with the fastest CPU-based software implementation available in the literature, DRAGON has achieved roughly 191× speed-up. Under the largest configuration and dataset, DRAGON was also able to overtake a more power-hungry PMFO-LP by almost 1.59× in speed, and at around 89.59% in power efficiency. More importantly than raw acceleration, we demonstrate the unique functional qualities of our approach as a flexible and fault-tolerant solution that makes it an interesting alternative for an anthology of applicative scenarios.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3524124</doi><tpages>27</tpages><orcidid>https://orcid.org/0000-0003-4405-0424</orcidid><orcidid>https://orcid.org/0000-0001-7260-0521</orcidid><orcidid>https://orcid.org/0000-0003-4141-5846</orcidid><orcidid>https://orcid.org/0000-0002-0166-1447</orcidid><orcidid>https://orcid.org/0000-0002-8655-3487</orcidid><orcidid>https://orcid.org/0000-0002-3704-1530</orcidid><orcidid>https://orcid.org/0000-0002-9046-2780</orcidid><orcidid>https://orcid.org/0000-0001-6218-4659</orcidid><orcidid>https://orcid.org/0000-0002-0223-9803</orcidid><orcidid>https://orcid.org/0000-0001-5674-2482</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1084-4309
ispartof	ACM transactions on design automation of electronic systems, 2023-01, Vol.28 (1), p.1-27, Article 1
issn	1084-4309 1557-7309
language	eng
recordid	cdi_crossref_primary_10_1145_3524124
source	ACM Digital Library
subjects	Computer systems organization Computing methodologies Embedded hardware Hardware Hardware accelerators Neural networks Real-time system architecture
title	DRAGON: Dynamic Recurrent Accelerator for Graph Online Convolution
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T17%3A34%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DRAGON:%20Dynamic%20Recurrent%20Accelerator%20for%20Graph%20Online%20Convolution&rft.jtitle=ACM%20transactions%20on%20design%20automation%20of%20electronic%20systems&rft.au=Romero%20Hung,%20Jos%C3%A9&rft.date=2023-01-20&rft.volume=28&rft.issue=1&rft.spage=1&rft.epage=27&rft.pages=1-27&rft.artnum=1&rft.issn=1084-4309&rft.eissn=1557-7309&rft_id=info:doi/10.1145/3524124&rft_dat=%3Cacm_cross%3E3524124%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true