QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads

Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a cl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2022-06, Vol.38 (12), p.3192-3199
Hauptverfasser: Jiao, Xiaoli, Imamichi, Hiromi, Sherman, Brad T, Nahar, Rishub, Dewar, Robin L, Lane, H Clifford, Imamichi, Tomozumi, Chang, Weizhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3199
container_issue 12
container_start_page 3192
container_title Bioinformatics
container_volume 38
creator Jiao, Xiaoli
Imamichi, Hiromi
Sherman, Brad T
Nahar, Rishub
Dewar, Robin L
Lane, H Clifford
Imamichi, Tomozumi
Chang, Weizhong
description Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. Results We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy
doi_str_mv 10.1093/bioinformatics/btac313
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9890302</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btac313</oup_id><sourcerecordid>2661487904</sourcerecordid><originalsourceid>FETCH-LOGICAL-c456t-ea6f7cd074af167b838f225471a2c4c52a0696fedbd5a68b9a559121c33feb9f3</originalsourceid><addsrcrecordid>eNqNkUtv1TAQhS0EoqXwF6os2YT6nZgFElTlIVUCBKytiTNujXzjXDsp4t8T614qumNl65xvjsc6hJwz-opRIy6GkMLkU97BEly5GBZwgolH5JRJTVtOlXm83YXuWtlTcUKelfKTUsWklE_JiVBKcNp3pyR-XaGEb7h_3cw5-RDDdNPchQyx2VenzOgClk2CpmD07bJOFan6UikX17JgrtqvsNw2X8C9C6mJqUK4X3Fy1csIY3lOnniIBV8czzPy4_3V98uP7fXnD58u3163Tiq9tAjad26knQTPdDf0ovecK9kx4E46xYFqoz2Ow6hA94MBpQzjzAnhcTBenJE3h9x5HXY4OpzqqnbOYQf5t00Q7ENnCrf2Jt1Z0xsqKN8CXh4Dctq-UBa7C8VhjDBhWovlWjPZd4bKDdUH1OVUSkZ__wyjtlZlH1Zlj1Vtg-f_Lnk_9rebDWAHIK3z_4b-Aee_q3I</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2661487904</pqid></control><display><type>article</type><title>QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads</title><source>Oxford Journals Open Access Collection</source><creator>Jiao, Xiaoli ; Imamichi, Hiromi ; Sherman, Brad T ; Nahar, Rishub ; Dewar, Robin L ; Lane, H Clifford ; Imamichi, Tomozumi ; Chang, Weizhong</creator><contributor>Mathelier, Anthony</contributor><creatorcontrib>Jiao, Xiaoli ; Imamichi, Hiromi ; Sherman, Brad T ; Nahar, Rishub ; Dewar, Robin L ; Lane, H Clifford ; Imamichi, Tomozumi ; Chang, Weizhong ; Mathelier, Anthony</creatorcontrib><description>Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. Results We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy &lt;80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. Availability and implementation QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>ISSN: 1367-4811</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btac313</identifier><identifier>PMID: 35532087</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Cluster Analysis ; High-Throughput Nucleotide Sequencing ; Original Papers ; Quasispecies ; Sequence Analysis, DNA ; Software</subject><ispartof>Bioinformatics, 2022-06, Vol.38 (12), p.3192-3199</ispartof><rights>Published by Oxford University Press 2022. 2022</rights><rights>Published by Oxford University Press 2022.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c456t-ea6f7cd074af167b838f225471a2c4c52a0696fedbd5a68b9a559121c33feb9f3</citedby><cites>FETCH-LOGICAL-c456t-ea6f7cd074af167b838f225471a2c4c52a0696fedbd5a68b9a559121c33feb9f3</cites><orcidid>0000-0002-1413-2763</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890302/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890302/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27923,27924,53790,53792</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btac313$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35532087$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Mathelier, Anthony</contributor><creatorcontrib>Jiao, Xiaoli</creatorcontrib><creatorcontrib>Imamichi, Hiromi</creatorcontrib><creatorcontrib>Sherman, Brad T</creatorcontrib><creatorcontrib>Nahar, Rishub</creatorcontrib><creatorcontrib>Dewar, Robin L</creatorcontrib><creatorcontrib>Lane, H Clifford</creatorcontrib><creatorcontrib>Imamichi, Tomozumi</creatorcontrib><creatorcontrib>Chang, Weizhong</creatorcontrib><title>QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. Results We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy &lt;80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. Availability and implementation QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Cluster Analysis</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Original Papers</subject><subject>Quasispecies</subject><subject>Sequence Analysis, DNA</subject><subject>Software</subject><issn>1367-4803</issn><issn>1367-4811</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtv1TAQhS0EoqXwF6os2YT6nZgFElTlIVUCBKytiTNujXzjXDsp4t8T614qumNl65xvjsc6hJwz-opRIy6GkMLkU97BEly5GBZwgolH5JRJTVtOlXm83YXuWtlTcUKelfKTUsWklE_JiVBKcNp3pyR-XaGEb7h_3cw5-RDDdNPchQyx2VenzOgClk2CpmD07bJOFan6UikX17JgrtqvsNw2X8C9C6mJqUK4X3Fy1csIY3lOnniIBV8czzPy4_3V98uP7fXnD58u3163Tiq9tAjad26knQTPdDf0ovecK9kx4E46xYFqoz2Ow6hA94MBpQzjzAnhcTBenJE3h9x5HXY4OpzqqnbOYQf5t00Q7ENnCrf2Jt1Z0xsqKN8CXh4Dctq-UBa7C8VhjDBhWovlWjPZd4bKDdUH1OVUSkZ__wyjtlZlH1Zlj1Vtg-f_Lnk_9rebDWAHIK3z_4b-Aee_q3I</recordid><startdate>20220613</startdate><enddate>20220613</enddate><creator>Jiao, Xiaoli</creator><creator>Imamichi, Hiromi</creator><creator>Sherman, Brad T</creator><creator>Nahar, Rishub</creator><creator>Dewar, Robin L</creator><creator>Lane, H Clifford</creator><creator>Imamichi, Tomozumi</creator><creator>Chang, Weizhong</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-1413-2763</orcidid></search><sort><creationdate>20220613</creationdate><title>QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads</title><author>Jiao, Xiaoli ; Imamichi, Hiromi ; Sherman, Brad T ; Nahar, Rishub ; Dewar, Robin L ; Lane, H Clifford ; Imamichi, Tomozumi ; Chang, Weizhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c456t-ea6f7cd074af167b838f225471a2c4c52a0696fedbd5a68b9a559121c33feb9f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Cluster Analysis</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Original Papers</topic><topic>Quasispecies</topic><topic>Sequence Analysis, DNA</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiao, Xiaoli</creatorcontrib><creatorcontrib>Imamichi, Hiromi</creatorcontrib><creatorcontrib>Sherman, Brad T</creatorcontrib><creatorcontrib>Nahar, Rishub</creatorcontrib><creatorcontrib>Dewar, Robin L</creatorcontrib><creatorcontrib>Lane, H Clifford</creatorcontrib><creatorcontrib>Imamichi, Tomozumi</creatorcontrib><creatorcontrib>Chang, Weizhong</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiao, Xiaoli</au><au>Imamichi, Hiromi</au><au>Sherman, Brad T</au><au>Nahar, Rishub</au><au>Dewar, Robin L</au><au>Lane, H Clifford</au><au>Imamichi, Tomozumi</au><au>Chang, Weizhong</au><au>Mathelier, Anthony</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2022-06-13</date><risdate>2022</risdate><volume>38</volume><issue>12</issue><spage>3192</spage><epage>3199</epage><pages>3192-3199</pages><issn>1367-4803</issn><issn>1367-4811</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. Results We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy &lt;80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. Availability and implementation QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>35532087</pmid><doi>10.1093/bioinformatics/btac313</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-1413-2763</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2022-06, Vol.38 (12), p.3192-3199
issn 1367-4803
1367-4811
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9890302
source Oxford Journals Open Access Collection
subjects Algorithms
Cluster Analysis
High-Throughput Nucleotide Sequencing
Original Papers
Quasispecies
Sequence Analysis, DNA
Software
title QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A52%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=QuasiSeq:%20profiling%20viral%20quasispecies%20via%20self-tuning%20spectral%20clustering%20with%20PacBio%20long%20sequencing%20reads&rft.jtitle=Bioinformatics&rft.au=Jiao,%20Xiaoli&rft.date=2022-06-13&rft.volume=38&rft.issue=12&rft.spage=3192&rft.epage=3199&rft.pages=3192-3199&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btac313&rft_dat=%3Cproquest_TOX%3E2661487904%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2661487904&rft_id=info:pmid/35532087&rft_oup_id=10.1093/bioinformatics/btac313&rfr_iscdi=true