Optimizing sDTW for AMD GPUs

Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leve...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-03
Hauptverfasser: Latta-Lin, Daniel, Sofia Isadora Padilla Munoz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Latta-Lin, Daniel
Sofia Isadora Padilla Munoz
description Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2955956901</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2955956901</sourcerecordid><originalsourceid>FETCH-proquest_journals_29559569013</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ8S8oyczNrMrMS1codgkJV0jLL1Jw9HVRcA8ILeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjS1NTS1MzSwNDY-JUAQC9zyqL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955956901</pqid></control><display><type>article</type><title>Optimizing sDTW for AMD GPUs</title><source>Freely Accessible Journals</source><creator>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</creator><creatorcontrib>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</creatorcontrib><description>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Wave fronts</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Latta-Lin, Daniel</creatorcontrib><creatorcontrib>Sofia Isadora Padilla Munoz</creatorcontrib><title>Optimizing sDTW for AMD GPUs</title><title>arXiv.org</title><description>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</description><subject>Wave fronts</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ8S8oyczNrMrMS1codgkJV0jLL1Jw9HVRcA8ILeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjS1NTS1MzSwNDY-JUAQC9zyqL</recordid><startdate>20240311</startdate><enddate>20240311</enddate><creator>Latta-Lin, Daniel</creator><creator>Sofia Isadora Padilla Munoz</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240311</creationdate><title>Optimizing sDTW for AMD GPUs</title><author>Latta-Lin, Daniel ; Sofia Isadora Padilla Munoz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29559569013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Wave fronts</topic><toplevel>online_resources</toplevel><creatorcontrib>Latta-Lin, Daniel</creatorcontrib><creatorcontrib>Sofia Isadora Padilla Munoz</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Latta-Lin, Daniel</au><au>Sofia Isadora Padilla Munoz</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Optimizing sDTW for AMD GPUs</atitle><jtitle>arXiv.org</jtitle><date>2024-03-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2955956901
source Freely Accessible Journals
subjects Wave fronts
title Optimizing sDTW for AMD GPUs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T19%3A24%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Optimizing%20sDTW%20for%20AMD%20GPUs&rft.jtitle=arXiv.org&rft.au=Latta-Lin,%20Daniel&rft.date=2024-03-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2955956901%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2955956901&rft_id=info:pmid/&rfr_iscdi=true