GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data

DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2022-11, Vol.71 (11), p.3018-3031
Hauptverfasser: Koliogeorgi, Konstantina, Xydis, Sotirios, Gaydadjiev, Georgi, Soudris, Dimitrios
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3031
container_issue 11
container_start_page 3018
container_title IEEE transactions on computers
container_volume 71
creator Koliogeorgi, Konstantina
Xydis, Sotirios
Gaydadjiev, Georgi
Soudris, Dimitrios
description DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.
doi_str_mv 10.1109/TC.2022.3144115
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2723900471</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9711928</ieee_id><sourcerecordid>2723900471</sourcerecordid><originalsourceid>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhRdRsFbPHrwEPCedmd1Nst5CaqtQKth6XtZkV1PSpG5SxH9vaounOcz73oOPsVuECBHUZJ1HBEQRRyEQ5RkboZRJqJSMz9kIANNQcQGX7KrrNgAQE6gRy-fZcprNFg_B1PTG1e13kBWFra03fdU2gWt9sPpsfR-8WlMGWV19NFvb9MHwW85Xf9Q1u3Cm7uzN6Y7Z2-xxnT-Fi5f5c54twoJQ9aFLYoWlEI6bctimUhIny8vUxQiCk-MOqYjfjUFCWRQJmFRSyYUzUnEj-ZjdH3t3vv3a267Xm3bvm2FSU0JcAYgEh9TkmCp823XeOr3z1db4H42gD6b0OtcHU_pkaiDujkRlrf1PqwRRUcp_Ab66YK8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2723900471</pqid></control><display><type>article</type><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><source>IEEE Electronic Library (IEL)</source><creator>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</creator><creatorcontrib>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</creatorcontrib><description>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2022.3144115</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerators ; Algorithms ; Alignment ; Bioinformatics ; Bowtie2 ; Co-design ; Communications systems ; dataflow computing ; DNA ; Field programmable gate arrays ; Genomics ; Next generation sequencing ; reconfigurable acceleration ; Sequential analysis ; smith waterman ; Software ; String matching ; Task analysis ; traceback</subject><ispartof>IEEE transactions on computers, 2022-11, Vol.71 (11), p.3018-3031</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</citedby><cites>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</cites><orcidid>0000-0002-6930-6847 ; 0000-0003-3151-2730 ; 0000-0003-0064-7616</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9711928$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9711928$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Koliogeorgi, Konstantina</creatorcontrib><creatorcontrib>Xydis, Sotirios</creatorcontrib><creatorcontrib>Gaydadjiev, Georgi</creatorcontrib><creatorcontrib>Soudris, Dimitrios</creatorcontrib><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</description><subject>Accelerators</subject><subject>Algorithms</subject><subject>Alignment</subject><subject>Bioinformatics</subject><subject>Bowtie2</subject><subject>Co-design</subject><subject>Communications systems</subject><subject>dataflow computing</subject><subject>DNA</subject><subject>Field programmable gate arrays</subject><subject>Genomics</subject><subject>Next generation sequencing</subject><subject>reconfigurable acceleration</subject><subject>Sequential analysis</subject><subject>smith waterman</subject><subject>Software</subject><subject>String matching</subject><subject>Task analysis</subject><subject>traceback</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLw0AQhRdRsFbPHrwEPCedmd1Nst5CaqtQKth6XtZkV1PSpG5SxH9vaounOcz73oOPsVuECBHUZJ1HBEQRRyEQ5RkboZRJqJSMz9kIANNQcQGX7KrrNgAQE6gRy-fZcprNFg_B1PTG1e13kBWFra03fdU2gWt9sPpsfR-8WlMGWV19NFvb9MHwW85Xf9Q1u3Cm7uzN6Y7Z2-xxnT-Fi5f5c54twoJQ9aFLYoWlEI6bctimUhIny8vUxQiCk-MOqYjfjUFCWRQJmFRSyYUzUnEj-ZjdH3t3vv3a267Xm3bvm2FSU0JcAYgEh9TkmCp823XeOr3z1db4H42gD6b0OtcHU_pkaiDujkRlrf1PqwRRUcp_Ab66YK8</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Koliogeorgi, Konstantina</creator><creator>Xydis, Sotirios</creator><creator>Gaydadjiev, Georgi</creator><creator>Soudris, Dimitrios</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6930-6847</orcidid><orcidid>https://orcid.org/0000-0003-3151-2730</orcidid><orcidid>https://orcid.org/0000-0003-0064-7616</orcidid></search><sort><creationdate>20221101</creationdate><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><author>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accelerators</topic><topic>Algorithms</topic><topic>Alignment</topic><topic>Bioinformatics</topic><topic>Bowtie2</topic><topic>Co-design</topic><topic>Communications systems</topic><topic>dataflow computing</topic><topic>DNA</topic><topic>Field programmable gate arrays</topic><topic>Genomics</topic><topic>Next generation sequencing</topic><topic>reconfigurable acceleration</topic><topic>Sequential analysis</topic><topic>smith waterman</topic><topic>Software</topic><topic>String matching</topic><topic>Task analysis</topic><topic>traceback</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Koliogeorgi, Konstantina</creatorcontrib><creatorcontrib>Xydis, Sotirios</creatorcontrib><creatorcontrib>Gaydadjiev, Georgi</creatorcontrib><creatorcontrib>Soudris, Dimitrios</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koliogeorgi, Konstantina</au><au>Xydis, Sotirios</au><au>Gaydadjiev, Georgi</au><au>Soudris, Dimitrios</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2022-11-01</date><risdate>2022</risdate><volume>71</volume><issue>11</issue><spage>3018</spage><epage>3031</epage><pages>3018-3031</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2022.3144115</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-6930-6847</orcidid><orcidid>https://orcid.org/0000-0003-3151-2730</orcidid><orcidid>https://orcid.org/0000-0003-0064-7616</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9340
ispartof IEEE transactions on computers, 2022-11, Vol.71 (11), p.3018-3031
issn 0018-9340
1557-9956
language eng
recordid cdi_proquest_journals_2723900471
source IEEE Electronic Library (IEL)
subjects Accelerators
Algorithms
Alignment
Bioinformatics
Bowtie2
Co-design
Communications systems
dataflow computing
DNA
Field programmable gate arrays
Genomics
Next generation sequencing
reconfigurable acceleration
Sequential analysis
smith waterman
Software
String matching
Task analysis
traceback
title GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T14%3A26%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GANDAFL:%20Dataflow%20Acceleration%20for%20Short%20Read%20Alignment%20on%20NGS%20Data&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Koliogeorgi,%20Konstantina&rft.date=2022-11-01&rft.volume=71&rft.issue=11&rft.spage=3018&rft.epage=3031&rft.pages=3018-3031&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2022.3144115&rft_dat=%3Cproquest_RIE%3E2723900471%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2723900471&rft_id=info:pmid/&rft_ieee_id=9711928&rfr_iscdi=true