deBGR: an efficient and near-exact representation of the weighted de Bruijn graph

Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics (Oxford, England) England), 2017-07, Vol.33 (14), p.i133-i141
Hauptverfasser:	Pandey, Prashant, Bender, Michael A, Johnson, Rob, Patro, Rob
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computational Biology - methods Gene Expression Profiling - methods Sequence Analysis, RNA - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	i141
container_issue	14
container_start_page	i133
container_title	Bioinformatics (Oxford, England)
container_volume	33
creator	Pandey, Prashant Bender, Michael A Johnson, Rob Patro, Rob
description	Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered by PacBio or Oxford Nanopore), efficient k -mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k -mer occurs, which is key in transcriptome assemblers. We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18-28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. https://github.com/splatlab/debgr . rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online.
doi_str_mv	10.1093/bioinformatics/btx261
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5870571</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1937517930</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-e66f2d66d546ec45cdda63c9168591af51ee6ea1a6c6efa8b503d9e6c892dc7f3</originalsourceid><addsrcrecordid>eNpVUcFuEzEQtRCItoFPAFmcuCy112uvzQGJVrRUqlQVwdlyxuOsq8QOtgPl71mUErWnmdG8ee-NHiFvOPvAmRGny5hjCrlsXItQT5ftvlf8GTnmQo3doDl_fuiZOCIntd4xxiST6iU56rXW3Bh5TG49nl1--0hdohhChIipzYOnCV3p8N5BowW3Beu8mKVyojnQNiH9jXE1NfTUIz0ru3iX6Kq47fSKvAhuXfH1Q12QHxdfvp9_7a5vLq_OP193MAjdOlQq9F4pLweFMEjw3ikBhistDXdBckSFjjsFCoPTS8mEN6hAm97DGMSCfNrzbnfLDXqY_RW3ttsSN678sdlF-3ST4mRX-ZeVemRy5DPBuz1Bri3aCrEhTJBTQmiWD4KJsZ9B7x9USv65w9rsJlbA9dolzLtquRGj5KOZ0Qsi91AoudaC4eCFM_svM_s0M7vPbL57-_iRw9X_kMRf4yKaMA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1937517930</pqid></control><display><type>article</type><title>deBGR: an efficient and near-exact representation of the weighted de Bruijn graph</title><source>MEDLINE</source><source>Access via Oxford University Press (Open Access Collection)</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Pandey, Prashant ; Bender, Michael A ; Johnson, Rob ; Patro, Rob</creator><creatorcontrib>Pandey, Prashant ; Bender, Michael A ; Johnson, Rob ; Patro, Rob</creatorcontrib><description>Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered by PacBio or Oxford Nanopore), efficient k -mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k -mer occurs, which is key in transcriptome assemblers. We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18-28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. https://github.com/splatlab/debgr . rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btx261</identifier><identifier>PMID: 28881995</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Computational Biology - methods ; Gene Expression Profiling - methods ; Sequence Analysis, RNA - methods ; Software</subject><ispartof>Bioinformatics (Oxford, England), 2017-07, Vol.33 (14), p.i133-i141</ispartof><rights>The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com</rights><rights>The Author 2017. Published by Oxford University Press. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-e66f2d66d546ec45cdda63c9168591af51ee6ea1a6c6efa8b503d9e6c892dc7f3</citedby><cites>FETCH-LOGICAL-c438t-e66f2d66d546ec45cdda63c9168591af51ee6ea1a6c6efa8b503d9e6c892dc7f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870571/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870571/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,886,27929,27930,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28881995$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1430372$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Pandey, Prashant</creatorcontrib><creatorcontrib>Bender, Michael A</creatorcontrib><creatorcontrib>Johnson, Rob</creatorcontrib><creatorcontrib>Patro, Rob</creatorcontrib><title>deBGR: an efficient and near-exact representation of the weighted de Bruijn graph</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered by PacBio or Oxford Nanopore), efficient k -mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k -mer occurs, which is key in transcriptome assemblers. We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18-28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. https://github.com/splatlab/debgr . rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Gene Expression Profiling - methods</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Software</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVUcFuEzEQtRCItoFPAFmcuCy112uvzQGJVrRUqlQVwdlyxuOsq8QOtgPl71mUErWnmdG8ee-NHiFvOPvAmRGny5hjCrlsXItQT5ftvlf8GTnmQo3doDl_fuiZOCIntd4xxiST6iU56rXW3Bh5TG49nl1--0hdohhChIipzYOnCV3p8N5BowW3Beu8mKVyojnQNiH9jXE1NfTUIz0ru3iX6Kq47fSKvAhuXfH1Q12QHxdfvp9_7a5vLq_OP193MAjdOlQq9F4pLweFMEjw3ikBhistDXdBckSFjjsFCoPTS8mEN6hAm97DGMSCfNrzbnfLDXqY_RW3ttsSN678sdlF-3ST4mRX-ZeVemRy5DPBuz1Bri3aCrEhTJBTQmiWD4KJsZ9B7x9USv65w9rsJlbA9dolzLtquRGj5KOZ0Qsi91AoudaC4eCFM_svM_s0M7vPbL57-_iRw9X_kMRf4yKaMA</recordid><startdate>20170715</startdate><enddate>20170715</enddate><creator>Pandey, Prashant</creator><creator>Bender, Michael A</creator><creator>Johnson, Rob</creator><creator>Patro, Rob</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope></search><sort><creationdate>20170715</creationdate><title>deBGR: an efficient and near-exact representation of the weighted de Bruijn graph</title><author>Pandey, Prashant ; Bender, Michael A ; Johnson, Rob ; Patro, Rob</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-e66f2d66d546ec45cdda63c9168591af51ee6ea1a6c6efa8b503d9e6c892dc7f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Gene Expression Profiling - methods</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pandey, Prashant</creatorcontrib><creatorcontrib>Bender, Michael A</creatorcontrib><creatorcontrib>Johnson, Rob</creatorcontrib><creatorcontrib>Patro, Rob</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pandey, Prashant</au><au>Bender, Michael A</au><au>Johnson, Rob</au><au>Patro, Rob</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>deBGR: an efficient and near-exact representation of the weighted de Bruijn graph</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2017-07-15</date><risdate>2017</risdate><volume>33</volume><issue>14</issue><spage>i133</spage><epage>i141</epage><pages>i133-i141</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered by PacBio or Oxford Nanopore), efficient k -mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k -mer occurs, which is key in transcriptome assemblers. We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18-28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. https://github.com/splatlab/debgr . rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>28881995</pmid><doi>10.1093/bioinformatics/btx261</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics (Oxford, England), 2017-07, Vol.33 (14), p.i133-i141
issn	1367-4803 1367-4811
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5870571
source	MEDLINE; Access via Oxford University Press (Open Access Collection); Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects	Algorithms Computational Biology - methods Gene Expression Profiling - methods Sequence Analysis, RNA - methods Software
title	deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T11%3A35%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=deBGR:%20an%20efficient%20and%20near-exact%20representation%20of%20the%20weighted%20de%20Bruijn%20graph&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Pandey,%20Prashant&rft.date=2017-07-15&rft.volume=33&rft.issue=14&rft.spage=i133&rft.epage=i141&rft.pages=i133-i141&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btx261&rft_dat=%3Cproquest_pubme%3E1937517930%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1937517930&rft_id=info:pmid/28881995&rfr_iscdi=true