GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data
DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on computers 2022-11, Vol.71 (11), p.3018-3031 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3031 |
---|---|
container_issue | 11 |
container_start_page | 3018 |
container_title | IEEE transactions on computers |
container_volume | 71 |
creator | Koliogeorgi, Konstantina Xydis, Sotirios Gaydadjiev, Georgi Soudris, Dimitrios |
description | DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup. |
doi_str_mv | 10.1109/TC.2022.3144115 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2723900471</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9711928</ieee_id><sourcerecordid>2723900471</sourcerecordid><originalsourceid>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhRdRsFbPHrwEPCedmd1Nst5CaqtQKth6XtZkV1PSpG5SxH9vaounOcz73oOPsVuECBHUZJ1HBEQRRyEQ5RkboZRJqJSMz9kIANNQcQGX7KrrNgAQE6gRy-fZcprNFg_B1PTG1e13kBWFra03fdU2gWt9sPpsfR-8WlMGWV19NFvb9MHwW85Xf9Q1u3Cm7uzN6Y7Z2-xxnT-Fi5f5c54twoJQ9aFLYoWlEI6bctimUhIny8vUxQiCk-MOqYjfjUFCWRQJmFRSyYUzUnEj-ZjdH3t3vv3a267Xm3bvm2FSU0JcAYgEh9TkmCp823XeOr3z1db4H42gD6b0OtcHU_pkaiDujkRlrf1PqwRRUcp_Ab66YK8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2723900471</pqid></control><display><type>article</type><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><source>IEEE Electronic Library (IEL)</source><creator>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</creator><creatorcontrib>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</creatorcontrib><description>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2022.3144115</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accelerators ; Algorithms ; Alignment ; Bioinformatics ; Bowtie2 ; Co-design ; Communications systems ; dataflow computing ; DNA ; Field programmable gate arrays ; Genomics ; Next generation sequencing ; reconfigurable acceleration ; Sequential analysis ; smith waterman ; Software ; String matching ; Task analysis ; traceback</subject><ispartof>IEEE transactions on computers, 2022-11, Vol.71 (11), p.3018-3031</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</citedby><cites>FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</cites><orcidid>0000-0002-6930-6847 ; 0000-0003-3151-2730 ; 0000-0003-0064-7616</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9711928$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9711928$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Koliogeorgi, Konstantina</creatorcontrib><creatorcontrib>Xydis, Sotirios</creatorcontrib><creatorcontrib>Gaydadjiev, Georgi</creatorcontrib><creatorcontrib>Soudris, Dimitrios</creatorcontrib><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</description><subject>Accelerators</subject><subject>Algorithms</subject><subject>Alignment</subject><subject>Bioinformatics</subject><subject>Bowtie2</subject><subject>Co-design</subject><subject>Communications systems</subject><subject>dataflow computing</subject><subject>DNA</subject><subject>Field programmable gate arrays</subject><subject>Genomics</subject><subject>Next generation sequencing</subject><subject>reconfigurable acceleration</subject><subject>Sequential analysis</subject><subject>smith waterman</subject><subject>Software</subject><subject>String matching</subject><subject>Task analysis</subject><subject>traceback</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLw0AQhRdRsFbPHrwEPCedmd1Nst5CaqtQKth6XtZkV1PSpG5SxH9vaounOcz73oOPsVuECBHUZJ1HBEQRRyEQ5RkboZRJqJSMz9kIANNQcQGX7KrrNgAQE6gRy-fZcprNFg_B1PTG1e13kBWFra03fdU2gWt9sPpsfR-8WlMGWV19NFvb9MHwW85Xf9Q1u3Cm7uzN6Y7Z2-xxnT-Fi5f5c54twoJQ9aFLYoWlEI6bctimUhIny8vUxQiCk-MOqYjfjUFCWRQJmFRSyYUzUnEj-ZjdH3t3vv3a267Xm3bvm2FSU0JcAYgEh9TkmCp823XeOr3z1db4H42gD6b0OtcHU_pkaiDujkRlrf1PqwRRUcp_Ab66YK8</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Koliogeorgi, Konstantina</creator><creator>Xydis, Sotirios</creator><creator>Gaydadjiev, Georgi</creator><creator>Soudris, Dimitrios</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6930-6847</orcidid><orcidid>https://orcid.org/0000-0003-3151-2730</orcidid><orcidid>https://orcid.org/0000-0003-0064-7616</orcidid></search><sort><creationdate>20221101</creationdate><title>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</title><author>Koliogeorgi, Konstantina ; Xydis, Sotirios ; Gaydadjiev, Georgi ; Soudris, Dimitrios</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c219t-f7691d44f3ad2092d5232e3d8f610432f3f12c6baa1215cc70a852d34fa593a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accelerators</topic><topic>Algorithms</topic><topic>Alignment</topic><topic>Bioinformatics</topic><topic>Bowtie2</topic><topic>Co-design</topic><topic>Communications systems</topic><topic>dataflow computing</topic><topic>DNA</topic><topic>Field programmable gate arrays</topic><topic>Genomics</topic><topic>Next generation sequencing</topic><topic>reconfigurable acceleration</topic><topic>Sequential analysis</topic><topic>smith waterman</topic><topic>Software</topic><topic>String matching</topic><topic>Task analysis</topic><topic>traceback</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Koliogeorgi, Konstantina</creatorcontrib><creatorcontrib>Xydis, Sotirios</creatorcontrib><creatorcontrib>Gaydadjiev, Georgi</creatorcontrib><creatorcontrib>Soudris, Dimitrios</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koliogeorgi, Konstantina</au><au>Xydis, Sotirios</au><au>Gaydadjiev, Georgi</au><au>Soudris, Dimitrios</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2022-11-01</date><risdate>2022</risdate><volume>71</volume><issue>11</issue><spage>3018</spage><epage>3031</epage><pages>3018-3031</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2022.3144115</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-6930-6847</orcidid><orcidid>https://orcid.org/0000-0003-3151-2730</orcidid><orcidid>https://orcid.org/0000-0003-0064-7616</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9340 |
ispartof | IEEE transactions on computers, 2022-11, Vol.71 (11), p.3018-3031 |
issn | 0018-9340 1557-9956 |
language | eng |
recordid | cdi_proquest_journals_2723900471 |
source | IEEE Electronic Library (IEL) |
subjects | Accelerators Algorithms Alignment Bioinformatics Bowtie2 Co-design Communications systems dataflow computing DNA Field programmable gate arrays Genomics Next generation sequencing reconfigurable acceleration Sequential analysis smith waterman Software String matching Task analysis traceback |
title | GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T14%3A26%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GANDAFL:%20Dataflow%20Acceleration%20for%20Short%20Read%20Alignment%20on%20NGS%20Data&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Koliogeorgi,%20Konstantina&rft.date=2022-11-01&rft.volume=71&rft.issue=11&rft.spage=3018&rft.epage=3031&rft.pages=3018-3031&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2022.3144115&rft_dat=%3Cproquest_RIE%3E2723900471%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2723900471&rft_id=info:pmid/&rft_ieee_id=9711928&rfr_iscdi=true |