Remote homology search with hidden Potts models
Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignme...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2020-11, Vol.16 (11), p.e1008085-e1008085 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1008085 |
---|---|
container_issue | 11 |
container_start_page | e1008085 |
container_title | PLoS computational biology |
container_volume | 16 |
creator | Wilburn, Grey W Eddy, Sean R |
description | Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments. |
doi_str_mv | 10.1371/journal.pcbi.1008085 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2479465142</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A643392421</galeid><doaj_id>oai_doaj_org_article_32d74d632ea1404cba458eb2dcd1408e</doaj_id><sourcerecordid>A643392421</sourcerecordid><originalsourceid>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</originalsourceid><addsrcrecordid>eNqVkk1v1DAQhiMEoqXwDxBE4gKH3cafsS9IVVVgpQpQgbPl2JOsV0m8tR1o_z1eNq26iAvywePxM-98aIriJaqWiNTodOOnMOp-uTWNW6KqEpVgj4pjxBhZ1ISJxw_so-JZjJuqyqbkT4sjQjAjiJLj4vQKBp-gXPvB9767LSPoYNblL5fW5dpZC2P51acUy8Fb6OPz4kmr-wgv5vuk-PHh4vv5p8Xll4-r87PLheGEpIXGUuPGNAhz0VRSaI0FlpRTiRjC1ragOW-xxATAICGYFJjWGnHe2FYyTU6K13vdbe-jmnuNKkNZhSGKM7HaE9brjdoGN-hwq7x26o_Dh07pkJzpQRFsa2o5waARrahpNGUCGmyNzW8BWev9nG1qBrAGxhR0fyB6-DO6ter8T1XXWCCxK-btLBD89QQxqcFFA32vR_DTrm7O8_QRkhl98xf67-6We6rTuQE3tj7nNflYGJzxI7Qu-884JURiilEOeHcQkJkEN6nTU4xq9e3qP9jPhyzdsyb4GAO091NBldot4l35areIal7EHPbq4UTvg-42j_wGo3vX-A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2479465142</pqid></control><display><type>article</type><title>Remote homology search with hidden Potts models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Wilburn, Grey W ; Eddy, Sean R</creator><contributor>Roy, Sushmita</contributor><creatorcontrib>Wilburn, Grey W ; Eddy, Sean R ; Roy, Sushmita</creatorcontrib><description>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1008085</identifier><identifier>PMID: 33253143</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Amino acids ; Biology and life sciences ; Columns (structural) ; Computer and Information Sciences ; Computer Simulation ; Conserved sequence ; Dynamic programming ; Engineering and Technology ; Gene sequencing ; Homology ; Homology (Biology) ; Likelihood Functions ; Markov chains ; Mathematical models ; Models, Statistical ; Nucleic Acid Conformation ; Nucleotide sequence ; Nucleotides ; Polynomials ; Probability ; Proteins ; Research and Analysis Methods ; Ribonucleic acid ; RNA ; Sequence Analysis, RNA - methods ; Transition probabilities</subject><ispartof>PLoS computational biology, 2020-11, Vol.16 (11), p.e1008085-e1008085</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Wilburn, Eddy. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Wilburn, Eddy 2020 Wilburn, Eddy</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</citedby><cites>FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</cites><orcidid>0000-0003-4634-7707 ; 0000-0001-6676-4706</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33253143$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Roy, Sushmita</contributor><creatorcontrib>Wilburn, Grey W</creatorcontrib><creatorcontrib>Eddy, Sean R</creatorcontrib><title>Remote homology search with hidden Potts models</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Biology and life sciences</subject><subject>Columns (structural)</subject><subject>Computer and Information Sciences</subject><subject>Computer Simulation</subject><subject>Conserved sequence</subject><subject>Dynamic programming</subject><subject>Engineering and Technology</subject><subject>Gene sequencing</subject><subject>Homology</subject><subject>Homology (Biology)</subject><subject>Likelihood Functions</subject><subject>Markov chains</subject><subject>Mathematical models</subject><subject>Models, Statistical</subject><subject>Nucleic Acid Conformation</subject><subject>Nucleotide sequence</subject><subject>Nucleotides</subject><subject>Polynomials</subject><subject>Probability</subject><subject>Proteins</subject><subject>Research and Analysis Methods</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Transition probabilities</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkk1v1DAQhiMEoqXwDxBE4gKH3cafsS9IVVVgpQpQgbPl2JOsV0m8tR1o_z1eNq26iAvywePxM-98aIriJaqWiNTodOOnMOp-uTWNW6KqEpVgj4pjxBhZ1ISJxw_so-JZjJuqyqbkT4sjQjAjiJLj4vQKBp-gXPvB9767LSPoYNblL5fW5dpZC2P51acUy8Fb6OPz4kmr-wgv5vuk-PHh4vv5p8Xll4-r87PLheGEpIXGUuPGNAhz0VRSaI0FlpRTiRjC1ragOW-xxATAICGYFJjWGnHe2FYyTU6K13vdbe-jmnuNKkNZhSGKM7HaE9brjdoGN-hwq7x26o_Dh07pkJzpQRFsa2o5waARrahpNGUCGmyNzW8BWev9nG1qBrAGxhR0fyB6-DO6ter8T1XXWCCxK-btLBD89QQxqcFFA32vR_DTrm7O8_QRkhl98xf67-6We6rTuQE3tj7nNflYGJzxI7Qu-884JURiilEOeHcQkJkEN6nTU4xq9e3qP9jPhyzdsyb4GAO091NBldot4l35areIal7EHPbq4UTvg-42j_wGo3vX-A</recordid><startdate>20201130</startdate><enddate>20201130</enddate><creator>Wilburn, Grey W</creator><creator>Eddy, Sean R</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-4634-7707</orcidid><orcidid>https://orcid.org/0000-0001-6676-4706</orcidid></search><sort><creationdate>20201130</creationdate><title>Remote homology search with hidden Potts models</title><author>Wilburn, Grey W ; Eddy, Sean R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c633t-a29a2bcb1268b098aa282946491512ddfea66f2923eec188598247a166bdf95a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Biology and life sciences</topic><topic>Columns (structural)</topic><topic>Computer and Information Sciences</topic><topic>Computer Simulation</topic><topic>Conserved sequence</topic><topic>Dynamic programming</topic><topic>Engineering and Technology</topic><topic>Gene sequencing</topic><topic>Homology</topic><topic>Homology (Biology)</topic><topic>Likelihood Functions</topic><topic>Markov chains</topic><topic>Mathematical models</topic><topic>Models, Statistical</topic><topic>Nucleic Acid Conformation</topic><topic>Nucleotide sequence</topic><topic>Nucleotides</topic><topic>Polynomials</topic><topic>Probability</topic><topic>Proteins</topic><topic>Research and Analysis Methods</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Transition probabilities</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wilburn, Grey W</creatorcontrib><creatorcontrib>Eddy, Sean R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wilburn, Grey W</au><au>Eddy, Sean R</au><au>Roy, Sushmita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Remote homology search with hidden Potts models</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2020-11-30</date><risdate>2020</risdate><volume>16</volume><issue>11</issue><spage>e1008085</spage><epage>e1008085</epage><pages>e1008085-e1008085</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>33253143</pmid><doi>10.1371/journal.pcbi.1008085</doi><orcidid>https://orcid.org/0000-0003-4634-7707</orcidid><orcidid>https://orcid.org/0000-0001-6676-4706</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7358 |
ispartof | PLoS computational biology, 2020-11, Vol.16 (11), p.e1008085-e1008085 |
issn | 1553-7358 1553-734X 1553-7358 |
language | eng |
recordid | cdi_plos_journals_2479465142 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Algorithms Amino acids Biology and life sciences Columns (structural) Computer and Information Sciences Computer Simulation Conserved sequence Dynamic programming Engineering and Technology Gene sequencing Homology Homology (Biology) Likelihood Functions Markov chains Mathematical models Models, Statistical Nucleic Acid Conformation Nucleotide sequence Nucleotides Polynomials Probability Proteins Research and Analysis Methods Ribonucleic acid RNA Sequence Analysis, RNA - methods Transition probabilities |
title | Remote homology search with hidden Potts models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A05%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Remote%20homology%20search%20with%20hidden%20Potts%20models&rft.jtitle=PLoS%20computational%20biology&rft.au=Wilburn,%20Grey%20W&rft.date=2020-11-30&rft.volume=16&rft.issue=11&rft.spage=e1008085&rft.epage=e1008085&rft.pages=e1008085-e1008085&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1008085&rft_dat=%3Cgale_plos_%3EA643392421%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2479465142&rft_id=info:pmid/33253143&rft_galeid=A643392421&rft_doaj_id=oai_doaj_org_article_32d74d632ea1404cba458eb2dcd1408e&rfr_iscdi=true |