Optimizing sDTW for AMD GPUs

Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Latta-Lin, Daniel, Sofia Isadora Padilla Munoz
Format:	Artikel
Sprache:	eng
Schlagworte:	Wave fronts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Latta-Lin, Daniel Sofia Isadora Padilla Munoz
description	Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2955956901</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2955956901</sourcerecordid><originalsourceid>FETCH-proquest_journals_29559569013</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ8S8oyczNrMrMS1codgkJV0jLL1Jw9HVRcA8ILeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjS1NTS1MzSwNDY-JUAQC9zyqL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955956901</pqid></control><display><type>article</type><title>Optimizing sDTW for AMD GPUs</title><source>Freely Accessible Journals</source><creator>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</creator><creatorcontrib>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</creatorcontrib><description>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Wave fronts</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Latta-Lin, Daniel</creatorcontrib><creatorcontrib>Sofia Isadora Padilla Munoz</creatorcontrib><title>Optimizing sDTW for AMD GPUs</title><title>arXiv.org</title><description>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</description><subject>Wave fronts</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ8S8oyczNrMrMS1codgkJV0jLL1Jw9HVRcA8ILeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjS1NTS1MzSwNDY-JUAQC9zyqL</recordid><startdate>20240311</startdate><enddate>20240311</enddate><creator>Latta-Lin, Daniel</creator><creator>Sofia Isadora Padilla Munoz</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240311</creationdate><title>Optimizing sDTW for AMD GPUs</title><author>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29559569013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Wave fronts</topic><toplevel>online_resources</toplevel><creatorcontrib>Latta-Lin, Daniel</creatorcontrib><creatorcontrib>Sofia Isadora Padilla Munoz</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Latta-Lin, Daniel</au><au>Sofia Isadora Padilla Munoz</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimizing sDTW for AMD GPUs</atitle><jtitle>arXiv.org</jtitle><date>2024-03-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2955956901
source	Freely Accessible Journals
subjects	Wave fronts
title	Optimizing sDTW for AMD GPUs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T19%3A24%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimizing%20sDTW%20for%20AMD%20GPUs&rft.jtitle=arXiv.org&rft.au=Latta-Lin,%20Daniel&rft.date=2024-03-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2955956901%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2955956901&rft_id=info:pmid/&rfr_iscdi=true