Optimized Spatial Architecture Mapping Flow for Transformer Accelerators
Recent innovations in Transformer-based large language models have significantly advanced the field of general-purpose neural language understanding and generation. With billions of trainable parameters, deployment of these large models relies on high-performance hardware accelerators to efficiently...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Xu, Haocheng Tahmasebi, Faraz Qiao, Ye Tian, Hongzheng Kwon, Hyoukjun Huang, Sitao |
description | Recent innovations in Transformer-based large language models have
significantly advanced the field of general-purpose neural language
understanding and generation. With billions of trainable parameters, deployment
of these large models relies on high-performance hardware accelerators to
efficiently deliver the required computation. Spatial architectures, such as
TPUs, offer a promising solution to accelerating computation-intensive
workloads. However, the design process for existing spatial architectures is
predominantly manual, and it often involves time-consuming redesigns for new
applications and new problem dimensions, which greatly limits the development
of optimally designed accelerators for Transformer models. To address these
challenges, we propose SAMT (Spatial Architecture Mapping for Transformers), a
comprehensive framework designed to optimize the dataflow mapping of
Transformer inference workloads onto spatial accelerators. We demonstrate the
effectiveness of SAMT in improving the performance of spatial accelerators for
Transformer models. We propose and leverage the dynamic operator fusion schemes
for the Transformer models and co-search the optimal dataflow mapping
strategies for spatial accelerators. SAMT significantly reduces inference
latency by 12% to 91% and energy consumption by 3% to 23% for evaluated
Transformer models compared to traditional spatial accelerator designs among
edge, mobile and cloud settings. |
doi_str_mv | 10.48550/arxiv.2410.07407 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_07407</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_07407</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_074073</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJibGJhzMnj4F5Rk5mZWpaYoBBcklmQm5ig4FiVnZJakJpeUFqUq-CYWFGTmpSu45eSXK6TlFymEFCXmFQMZualFCo7Jyak5qUWJJflFxTwMrGmJOcWpvFCam0HezTXE2UMXbGd8QVFmbmJRZTzI7niw3caEVQAAJac6Iw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Optimized Spatial Architecture Mapping Flow for Transformer Accelerators</title><source>arXiv.org</source><creator>Xu, Haocheng ; Tahmasebi, Faraz ; Qiao, Ye ; Tian, Hongzheng ; Kwon, Hyoukjun ; Huang, Sitao</creator><creatorcontrib>Xu, Haocheng ; Tahmasebi, Faraz ; Qiao, Ye ; Tian, Hongzheng ; Kwon, Hyoukjun ; Huang, Sitao</creatorcontrib><description>Recent innovations in Transformer-based large language models have
significantly advanced the field of general-purpose neural language
understanding and generation. With billions of trainable parameters, deployment
of these large models relies on high-performance hardware accelerators to
efficiently deliver the required computation. Spatial architectures, such as
TPUs, offer a promising solution to accelerating computation-intensive
workloads. However, the design process for existing spatial architectures is
predominantly manual, and it often involves time-consuming redesigns for new
applications and new problem dimensions, which greatly limits the development
of optimally designed accelerators for Transformer models. To address these
challenges, we propose SAMT (Spatial Architecture Mapping for Transformers), a
comprehensive framework designed to optimize the dataflow mapping of
Transformer inference workloads onto spatial accelerators. We demonstrate the
effectiveness of SAMT in improving the performance of spatial accelerators for
Transformer models. We propose and leverage the dynamic operator fusion schemes
for the Transformer models and co-search the optimal dataflow mapping
strategies for spatial accelerators. SAMT significantly reduces inference
latency by 12% to 91% and energy consumption by 3% to 23% for evaluated
Transformer models compared to traditional spatial accelerator designs among
edge, mobile and cloud settings.</description><identifier>DOI: 10.48550/arxiv.2410.07407</identifier><language>eng</language><subject>Computer Science - Hardware Architecture</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.07407$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.07407$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Haocheng</creatorcontrib><creatorcontrib>Tahmasebi, Faraz</creatorcontrib><creatorcontrib>Qiao, Ye</creatorcontrib><creatorcontrib>Tian, Hongzheng</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Huang, Sitao</creatorcontrib><title>Optimized Spatial Architecture Mapping Flow for Transformer Accelerators</title><description>Recent innovations in Transformer-based large language models have
significantly advanced the field of general-purpose neural language
understanding and generation. With billions of trainable parameters, deployment
of these large models relies on high-performance hardware accelerators to
efficiently deliver the required computation. Spatial architectures, such as
TPUs, offer a promising solution to accelerating computation-intensive
workloads. However, the design process for existing spatial architectures is
predominantly manual, and it often involves time-consuming redesigns for new
applications and new problem dimensions, which greatly limits the development
of optimally designed accelerators for Transformer models. To address these
challenges, we propose SAMT (Spatial Architecture Mapping for Transformers), a
comprehensive framework designed to optimize the dataflow mapping of
Transformer inference workloads onto spatial accelerators. We demonstrate the
effectiveness of SAMT in improving the performance of spatial accelerators for
Transformer models. We propose and leverage the dynamic operator fusion schemes
for the Transformer models and co-search the optimal dataflow mapping
strategies for spatial accelerators. SAMT significantly reduces inference
latency by 12% to 91% and energy consumption by 3% to 23% for evaluated
Transformer models compared to traditional spatial accelerator designs among
edge, mobile and cloud settings.</description><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJibGJhzMnj4F5Rk5mZWpaYoBBcklmQm5ig4FiVnZJakJpeUFqUq-CYWFGTmpSu45eSXK6TlFymEFCXmFQMZualFCo7Jyak5qUWJJflFxTwMrGmJOcWpvFCam0HezTXE2UMXbGd8QVFmbmJRZTzI7niw3caEVQAAJac6Iw</recordid><startdate>20241009</startdate><enddate>20241009</enddate><creator>Xu, Haocheng</creator><creator>Tahmasebi, Faraz</creator><creator>Qiao, Ye</creator><creator>Tian, Hongzheng</creator><creator>Kwon, Hyoukjun</creator><creator>Huang, Sitao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241009</creationdate><title>Optimized Spatial Architecture Mapping Flow for Transformer Accelerators</title><author>Xu, Haocheng ; Tahmasebi, Faraz ; Qiao, Ye ; Tian, Hongzheng ; Kwon, Hyoukjun ; Huang, Sitao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_074073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Haocheng</creatorcontrib><creatorcontrib>Tahmasebi, Faraz</creatorcontrib><creatorcontrib>Qiao, Ye</creatorcontrib><creatorcontrib>Tian, Hongzheng</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Huang, Sitao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xu, Haocheng</au><au>Tahmasebi, Faraz</au><au>Qiao, Ye</au><au>Tian, Hongzheng</au><au>Kwon, Hyoukjun</au><au>Huang, Sitao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimized Spatial Architecture Mapping Flow for Transformer Accelerators</atitle><date>2024-10-09</date><risdate>2024</risdate><abstract>Recent innovations in Transformer-based large language models have
significantly advanced the field of general-purpose neural language
understanding and generation. With billions of trainable parameters, deployment
of these large models relies on high-performance hardware accelerators to
efficiently deliver the required computation. Spatial architectures, such as
TPUs, offer a promising solution to accelerating computation-intensive
workloads. However, the design process for existing spatial architectures is
predominantly manual, and it often involves time-consuming redesigns for new
applications and new problem dimensions, which greatly limits the development
of optimally designed accelerators for Transformer models. To address these
challenges, we propose SAMT (Spatial Architecture Mapping for Transformers), a
comprehensive framework designed to optimize the dataflow mapping of
Transformer inference workloads onto spatial accelerators. We demonstrate the
effectiveness of SAMT in improving the performance of spatial accelerators for
Transformer models. We propose and leverage the dynamic operator fusion schemes
for the Transformer models and co-search the optimal dataflow mapping
strategies for spatial accelerators. SAMT significantly reduces inference
latency by 12% to 91% and energy consumption by 3% to 23% for evaluated
Transformer models compared to traditional spatial accelerator designs among
edge, mobile and cloud settings.</abstract><doi>10.48550/arxiv.2410.07407</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.07407 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_07407 |
source | arXiv.org |
subjects | Computer Science - Hardware Architecture |
title | Optimized Spatial Architecture Mapping Flow for Transformer Accelerators |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T17%3A50%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimized%20Spatial%20Architecture%20Mapping%20Flow%20for%20Transformer%20Accelerators&rft.au=Xu,%20Haocheng&rft.date=2024-10-09&rft_id=info:doi/10.48550/arxiv.2410.07407&rft_dat=%3Carxiv_GOX%3E2410_07407%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |