Remote homology search with hidden Potts models

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignme...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2020-11, Vol.16 (11), p.e1008085-e1008085
Hauptverfasser: Wilburn, Grey W, Eddy, Sean R
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1008085
container_issue 11
container_start_page e1008085
container_title PLoS computational biology
container_volume 16
creator Wilburn, Grey W
Eddy, Sean R
description Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.
doi_str_mv 10.1371/journal.pcbi.1008085
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2479465142</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A643392421</galeid><doaj_id>oai_doaj_org_article_32d74d632ea1404cba458eb2dcd1408e</doaj_id><sourcerecordid>A643392421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</originalsourceid><addsrcrecordid>eNqVkk1v1DAQhiMEoqXwDxBE4gKH3cafsS9IVVVgpQpQgbPl2JOsV0m8tR1o_z1eNq26iAvywePxM-98aIriJaqWiNTodOOnMOp-uTWNW6KqEpVgj4pjxBhZ1ISJxw_so-JZjJuqyqbkT4sjQjAjiJLj4vQKBp-gXPvB9767LSPoYNblL5fW5dpZC2P51acUy8Fb6OPz4kmr-wgv5vuk-PHh4vv5p8Xll4-r87PLheGEpIXGUuPGNAhz0VRSaI0FlpRTiRjC1ragOW-xxATAICGYFJjWGnHe2FYyTU6K13vdbe-jmnuNKkNZhSGKM7HaE9brjdoGN-hwq7x26o_Dh07pkJzpQRFsa2o5waARrahpNGUCGmyNzW8BWev9nG1qBrAGxhR0fyB6-DO6ter8T1XXWCCxK-btLBD89QQxqcFFA32vR_DTrm7O8_QRkhl98xf67-6We6rTuQE3tj7nNflYGJzxI7Qu-884JURiilEOeHcQkJkEN6nTU4xq9e3qP9jPhyzdsyb4GAO091NBldot4l35areIal7EHPbq4UTvg-42j_wGo3vX-A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2479465142</pqid></control><display><type>article</type><title>Remote homology search with hidden Potts models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Wilburn, Grey W ; Eddy, Sean R</creator><contributor>Roy, Sushmita</contributor><creatorcontrib>Wilburn, Grey W ; Eddy, Sean R ; Roy, Sushmita</creatorcontrib><description>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1008085</identifier><identifier>PMID: 33253143</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Amino acids ; Biology and life sciences ; Columns (structural) ; Computer and Information Sciences ; Computer Simulation ; Conserved sequence ; Dynamic programming ; Engineering and Technology ; Gene sequencing ; Homology ; Homology (Biology) ; Likelihood Functions ; Markov chains ; Mathematical models ; Models, Statistical ; Nucleic Acid Conformation ; Nucleotide sequence ; Nucleotides ; Polynomials ; Probability ; Proteins ; Research and Analysis Methods ; Ribonucleic acid ; RNA ; Sequence Analysis, RNA - methods ; Transition probabilities</subject><ispartof>PLoS computational biology, 2020-11, Vol.16 (11), p.e1008085-e1008085</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Wilburn, Eddy. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Wilburn, Eddy 2020 Wilburn, Eddy</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</citedby><cites>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</cites><orcidid>0000-0003-4634-7707 ; 0000-0001-6676-4706</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33253143$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Roy, Sushmita</contributor><creatorcontrib>Wilburn, Grey W</creatorcontrib><creatorcontrib>Eddy, Sean R</creatorcontrib><title>Remote homology search with hidden Potts models</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Biology and life sciences</subject><subject>Columns (structural)</subject><subject>Computer and Information Sciences</subject><subject>Computer Simulation</subject><subject>Conserved sequence</subject><subject>Dynamic programming</subject><subject>Engineering and Technology</subject><subject>Gene sequencing</subject><subject>Homology</subject><subject>Homology (Biology)</subject><subject>Likelihood Functions</subject><subject>Markov chains</subject><subject>Mathematical models</subject><subject>Models, Statistical</subject><subject>Nucleic Acid Conformation</subject><subject>Nucleotide sequence</subject><subject>Nucleotides</subject><subject>Polynomials</subject><subject>Probability</subject><subject>Proteins</subject><subject>Research and Analysis Methods</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Transition probabilities</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk1v1DAQhiMEoqXwDxBE4gKH3cafsS9IVVVgpQpQgbPl2JOsV0m8tR1o_z1eNq26iAvywePxM-98aIriJaqWiNTodOOnMOp-uTWNW6KqEpVgj4pjxBhZ1ISJxw_so-JZjJuqyqbkT4sjQjAjiJLj4vQKBp-gXPvB9767LSPoYNblL5fW5dpZC2P51acUy8Fb6OPz4kmr-wgv5vuk-PHh4vv5p8Xll4-r87PLheGEpIXGUuPGNAhz0VRSaI0FlpRTiRjC1ragOW-xxATAICGYFJjWGnHe2FYyTU6K13vdbe-jmnuNKkNZhSGKM7HaE9brjdoGN-hwq7x26o_Dh07pkJzpQRFsa2o5waARrahpNGUCGmyNzW8BWev9nG1qBrAGxhR0fyB6-DO6ter8T1XXWCCxK-btLBD89QQxqcFFA32vR_DTrm7O8_QRkhl98xf67-6We6rTuQE3tj7nNflYGJzxI7Qu-884JURiilEOeHcQkJkEN6nTU4xq9e3qP9jPhyzdsyb4GAO091NBldot4l35areIal7EHPbq4UTvg-42j_wGo3vX-A</recordid><startdate>20201130</startdate><enddate>20201130</enddate><creator>Wilburn, Grey W</creator><creator>Eddy, Sean R</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-4634-7707</orcidid><orcidid>https://orcid.org/0000-0001-6676-4706</orcidid></search><sort><creationdate>20201130</creationdate><title>Remote homology search with hidden Potts models</title><author>Wilburn, Grey W ; Eddy, Sean R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Biology and life sciences</topic><topic>Columns (structural)</topic><topic>Computer and Information Sciences</topic><topic>Computer Simulation</topic><topic>Conserved sequence</topic><topic>Dynamic programming</topic><topic>Engineering and Technology</topic><topic>Gene sequencing</topic><topic>Homology</topic><topic>Homology (Biology)</topic><topic>Likelihood Functions</topic><topic>Markov chains</topic><topic>Mathematical models</topic><topic>Models, Statistical</topic><topic>Nucleic Acid Conformation</topic><topic>Nucleotide sequence</topic><topic>Nucleotides</topic><topic>Polynomials</topic><topic>Probability</topic><topic>Proteins</topic><topic>Research and Analysis Methods</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Transition probabilities</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wilburn, Grey W</creatorcontrib><creatorcontrib>Eddy, Sean R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wilburn, Grey W</au><au>Eddy, Sean R</au><au>Roy, Sushmita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Remote homology search with hidden Potts models</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2020-11-30</date><risdate>2020</risdate><volume>16</volume><issue>11</issue><spage>e1008085</spage><epage>e1008085</epage><pages>e1008085-e1008085</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>33253143</pmid><doi>10.1371/journal.pcbi.1008085</doi><orcidid>https://orcid.org/0000-0003-4634-7707</orcidid><orcidid>https://orcid.org/0000-0001-6676-4706</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2020-11, Vol.16 (11), p.e1008085-e1008085
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2479465142
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Algorithms
Amino acids
Biology and life sciences
Columns (structural)
Computer and Information Sciences
Computer Simulation
Conserved sequence
Dynamic programming
Engineering and Technology
Gene sequencing
Homology
Homology (Biology)
Likelihood Functions
Markov chains
Mathematical models
Models, Statistical
Nucleic Acid Conformation
Nucleotide sequence
Nucleotides
Polynomials
Probability
Proteins
Research and Analysis Methods
Ribonucleic acid
RNA
Sequence Analysis, RNA - methods
Transition probabilities
title Remote homology search with hidden Potts models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A05%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Remote%20homology%20search%20with%20hidden%20Potts%20models&rft.jtitle=PLoS%20computational%20biology&rft.au=Wilburn,%20Grey%20W&rft.date=2020-11-30&rft.volume=16&rft.issue=11&rft.spage=e1008085&rft.epage=e1008085&rft.pages=e1008085-e1008085&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1008085&rft_dat=%3Cgale_plos_%3EA643392421%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2479465142&rft_id=info:pmid/33253143&rft_galeid=A643392421&rft_doaj_id=oai_doaj_org_article_32d74d632ea1404cba458eb2dcd1408e&rfr_iscdi=true