Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing

Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2021-09, Vol.17 (9), p.e1009350-e1009350
Hauptverfasser: Smith, Michael, Chan, Rachel, Khurram, Maaz, Gordon, Paul M K
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1009350
container_issue 9
container_start_page e1009350
container_title PLoS computational biology
container_volume 17
creator Smith, Michael
Chan, Rachel
Khurram, Maaz
Gordon, Paul M K
description Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.
doi_str_mv 10.1371/journal.pcbi.1009350
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2582586736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A677502913</galeid><doaj_id>oai_doaj_org_article_c93dfe1860d04d21b65897a745226eb3</doaj_id><sourcerecordid>A677502913</sourcerecordid><originalsourceid>FETCH-LOGICAL-c582t-644a399a3a5365c704d76b00732cd22e8323e5766d68ac05f6f383ca469d7e353</originalsourceid><addsrcrecordid>eNqVk89u1DAQxiMEoqXwBggscYHDLk4c28kFaVUKrFSBBEUcLceZZF0ldrCTFft4vBmz3T_qIi4oh1jj3_fNeEaTJM9TOk-ZTN_e-ik43c0HU9l5SmnJOH2QnKecs5lkvHh473yWPInxllI8luJxcsZyTkUuy_Pk99Vad5MerWvJuAICTQNmtGtwECPxDQEXoa86IGt_B1lHbD8Evz4otDFT0GazhY1H2sUpkmhbLC4SJOvJQE2qDVnrYD3evb_5sSC6a32w46qPpAm-J3GEYYZOAdx4VLdYRtAjyuspbBM67fzgA5AIPydwBmNPk0cNsvBs_79Ivn-4urn8NLv-8nF5ubieGV5k40zkuWZlqZnmTHAjaV5LUVEqWWbqLIOCZQy4FKIWhTaUN6JhBTM6F2UtgXF2kbzc-Q6dj2rf_agydOeFkEwgsdwRtde3agi212GjvLbqLuBDq3QYrelAmZLVDaSFoDUWkqWVwMlILXOeZQIqhl7v9tmmqofaYFeC7k5MT2-cXanWr1WRc4nDRYPXe4PgsVVxVL2NBrpOO8AhYN0yLVMp5BZ99Rf679fNd1Sr8QHWNR7zGvxq6C0OHhqL8YWQmD4r0-0T3pwIkBnh19jqKUa1_Pb1P9jPp2y-Y03wMQZojl1JqdquxqF8tV0NtV8NlL2439Gj6LAL7A8U3A9Y</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2582586736</pqid></control><display><type>article</type><title>Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Smith, Michael ; Chan, Rachel ; Khurram, Maaz ; Gordon, Paul M K</creator><creatorcontrib>Smith, Michael ; Chan, Rachel ; Khurram, Maaz ; Gordon, Paul M K</creatorcontrib><description>Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1009350</identifier><identifier>PMID: 34506479</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Algorithms ; Biology and Life Sciences ; Biomolecules ; Computational Biology ; Computer Simulation ; Consensus Sequence ; Deoxyribonucleic acid ; DNA ; DNA sequencing ; Engineering and Technology ; Entropy ; Epigenetics ; Evaluation ; Funding ; Magnetic tape ; Methods ; Molecular motors ; Nanopore Sequencing - statistics &amp; numerical data ; Nanotechnology ; Neural networks ; Nucleotide sequence ; Nucleotide sequencing ; Nucleotides ; Phosphopyruvate hydratase ; Phosphopyruvate Hydratase - genetics ; Physical Sciences ; Research and Analysis Methods ; Ribonucleic acid ; RNA ; RNA - genetics ; RNA sequencing ; Signal processing ; Signal-To-Noise Ratio ; Software ; Stochastic Processes ; Stochasticity ; Voting</subject><ispartof>PLoS computational biology, 2021-09, Vol.17 (9), p.e1009350-e1009350</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Smith et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Smith et al 2021 Smith et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c582t-644a399a3a5365c704d76b00732cd22e8323e5766d68ac05f6f383ca469d7e353</cites><orcidid>0000-0002-5820-1919 ; 0000-0003-2881-1713 ; 0000-0003-2363-632X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8457506/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8457506/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79342,79343</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34506479$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Smith, Michael</creatorcontrib><creatorcontrib>Chan, Rachel</creatorcontrib><creatorcontrib>Khurram, Maaz</creatorcontrib><creatorcontrib>Gordon, Paul M K</creatorcontrib><title>Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Biology and Life Sciences</subject><subject>Biomolecules</subject><subject>Computational Biology</subject><subject>Computer Simulation</subject><subject>Consensus Sequence</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA sequencing</subject><subject>Engineering and Technology</subject><subject>Entropy</subject><subject>Epigenetics</subject><subject>Evaluation</subject><subject>Funding</subject><subject>Magnetic tape</subject><subject>Methods</subject><subject>Molecular motors</subject><subject>Nanopore Sequencing - statistics &amp; numerical data</subject><subject>Nanotechnology</subject><subject>Neural networks</subject><subject>Nucleotide sequence</subject><subject>Nucleotide sequencing</subject><subject>Nucleotides</subject><subject>Phosphopyruvate hydratase</subject><subject>Phosphopyruvate Hydratase - genetics</subject><subject>Physical Sciences</subject><subject>Research and Analysis Methods</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA - genetics</subject><subject>RNA sequencing</subject><subject>Signal processing</subject><subject>Signal-To-Noise Ratio</subject><subject>Software</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><subject>Voting</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqVk89u1DAQxiMEoqXwBggscYHDLk4c28kFaVUKrFSBBEUcLceZZF0ldrCTFft4vBmz3T_qIi4oh1jj3_fNeEaTJM9TOk-ZTN_e-ik43c0HU9l5SmnJOH2QnKecs5lkvHh473yWPInxllI8luJxcsZyTkUuy_Pk99Vad5MerWvJuAICTQNmtGtwECPxDQEXoa86IGt_B1lHbD8Evz4otDFT0GazhY1H2sUpkmhbLC4SJOvJQE2qDVnrYD3evb_5sSC6a32w46qPpAm-J3GEYYZOAdx4VLdYRtAjyuspbBM67fzgA5AIPydwBmNPk0cNsvBs_79Ivn-4urn8NLv-8nF5ubieGV5k40zkuWZlqZnmTHAjaV5LUVEqWWbqLIOCZQy4FKIWhTaUN6JhBTM6F2UtgXF2kbzc-Q6dj2rf_agydOeFkEwgsdwRtde3agi212GjvLbqLuBDq3QYrelAmZLVDaSFoDUWkqWVwMlILXOeZQIqhl7v9tmmqofaYFeC7k5MT2-cXanWr1WRc4nDRYPXe4PgsVVxVL2NBrpOO8AhYN0yLVMp5BZ99Rf679fNd1Sr8QHWNR7zGvxq6C0OHhqL8YWQmD4r0-0T3pwIkBnh19jqKUa1_Pb1P9jPp2y-Y03wMQZojl1JqdquxqF8tV0NtV8NlL2439Gj6LAL7A8U3A9Y</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Smith, Michael</creator><creator>Chan, Rachel</creator><creator>Khurram, Maaz</creator><creator>Gordon, Paul M K</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5820-1919</orcidid><orcidid>https://orcid.org/0000-0003-2881-1713</orcidid><orcidid>https://orcid.org/0000-0003-2363-632X</orcidid></search><sort><creationdate>20210901</creationdate><title>Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing</title><author>Smith, Michael ; Chan, Rachel ; Khurram, Maaz ; Gordon, Paul M K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c582t-644a399a3a5365c704d76b00732cd22e8323e5766d68ac05f6f383ca469d7e353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Biology and Life Sciences</topic><topic>Biomolecules</topic><topic>Computational Biology</topic><topic>Computer Simulation</topic><topic>Consensus Sequence</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA sequencing</topic><topic>Engineering and Technology</topic><topic>Entropy</topic><topic>Epigenetics</topic><topic>Evaluation</topic><topic>Funding</topic><topic>Magnetic tape</topic><topic>Methods</topic><topic>Molecular motors</topic><topic>Nanopore Sequencing - statistics &amp; numerical data</topic><topic>Nanotechnology</topic><topic>Neural networks</topic><topic>Nucleotide sequence</topic><topic>Nucleotide sequencing</topic><topic>Nucleotides</topic><topic>Phosphopyruvate hydratase</topic><topic>Phosphopyruvate Hydratase - genetics</topic><topic>Physical Sciences</topic><topic>Research and Analysis Methods</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA - genetics</topic><topic>RNA sequencing</topic><topic>Signal processing</topic><topic>Signal-To-Noise Ratio</topic><topic>Software</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><topic>Voting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Smith, Michael</creatorcontrib><creatorcontrib>Chan, Rachel</creatorcontrib><creatorcontrib>Khurram, Maaz</creatorcontrib><creatorcontrib>Gordon, Paul M K</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Smith, Michael</au><au>Chan, Rachel</au><au>Khurram, Maaz</au><au>Gordon, Paul M K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2021-09-01</date><risdate>2021</risdate><volume>17</volume><issue>9</issue><spage>e1009350</spage><epage>e1009350</epage><pages>e1009350-e1009350</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, "gold standard model", due to experimental and algorithmic artefacts. Other research fields use dynamic time warped-space averaging (DTWA) algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of DTW Barycentre averaging (DBA), minimize mean (MM) and stochastic sub-gradient descent (SSG) DTWA algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules Enolase, Sequin R1-71-1 and Sequin R2-55-3 without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to DTWA consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the "true" consensus, the study's gold standard model. The DBA consensus was the best match to the gold standard for both Sequin studies but was outperformed in the Enolase study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel "voting scheme" that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying Enolase and Sequin gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and voted-on DTWA consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and DTWA has application in other research fields involving time-distorted, high entropy signals.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>34506479</pmid><doi>10.1371/journal.pcbi.1009350</doi><orcidid>https://orcid.org/0000-0002-5820-1919</orcidid><orcidid>https://orcid.org/0000-0003-2881-1713</orcidid><orcidid>https://orcid.org/0000-0003-2363-632X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2021-09, Vol.17 (9), p.e1009350-e1009350
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2582586736
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Public Library of Science (PLoS)
subjects Accuracy
Algorithms
Biology and Life Sciences
Biomolecules
Computational Biology
Computer Simulation
Consensus Sequence
Deoxyribonucleic acid
DNA
DNA sequencing
Engineering and Technology
Entropy
Epigenetics
Evaluation
Funding
Magnetic tape
Methods
Molecular motors
Nanopore Sequencing - statistics & numerical data
Nanotechnology
Neural networks
Nucleotide sequence
Nucleotide sequencing
Nucleotides
Phosphopyruvate hydratase
Phosphopyruvate Hydratase - genetics
Physical Sciences
Research and Analysis Methods
Ribonucleic acid
RNA
RNA - genetics
RNA sequencing
Signal processing
Signal-To-Noise Ratio
Software
Stochastic Processes
Stochasticity
Voting
title Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T00%3A39%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20the%20effectiveness%20of%20ensemble%20voting%20in%20improving%20the%20accuracy%20of%20consensus%20signals%20produced%20by%20various%20DTWA%20algorithms%20from%20step-current%20signals%20generated%20during%20nanopore%20sequencing&rft.jtitle=PLoS%20computational%20biology&rft.au=Smith,%20Michael&rft.date=2021-09-01&rft.volume=17&rft.issue=9&rft.spage=e1009350&rft.epage=e1009350&rft.pages=e1009350-e1009350&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1009350&rft_dat=%3Cgale_plos_%3EA677502913%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2582586736&rft_id=info:pmid/34506479&rft_galeid=A677502913&rft_doaj_id=oai_doaj_org_article_c93dfe1860d04d21b65897a745226eb3&rfr_iscdi=true