A probabilistic measure for alignment-free sequence comparison
Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2004-12, Vol.20 (18), p.3455-3461 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3461 |
---|---|
container_issue | 18 |
container_start_page | 3455 |
container_title | Bioinformatics |
container_volume | 20 |
creator | Pham, Tuan D. Zuegg, Johannes |
description | Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author. |
doi_str_mv | 10.1093/bioinformatics/bth426 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67175674</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>768641541</sourcerecordid><originalsourceid>FETCH-LOGICAL-c546t-333e45c7c75916d1b8303c38d2ba5c6723b99d1167442519cf58c2f7b95823013</originalsourceid><addsrcrecordid>eNqFkUtLAzEUhYMoWh8_QRkE3Y3mncxGENFWKVioorgJSZrR6DxqMgP67420WHTj6gbudw735ACwj-AJggU5Nb71TdmGWnfexlPTvVDM18AAUQ5zDFmxnt6Ei5xKSLbAdoyvEDJEKd0EW4hhgYSEA3B2ns1Da7TxlY_JKaudjn1wWbLOdOWfm9o1XV4G57Lo3nvXWJfZtp7r4GPb7IKNUlfR7S3nDri_ury7GOXj2-H1xfk4t4zyLieEOMqssIIViM-QkQQSS-QMG80sF5iYopghxAWlmKHClkxaXApTMIkJRGQHHC9807HpiNip2kfrqko3ru2j4ikOS-p_QQwhh5LxBB7-AV_bPjQphEKF5IkQMEFsAdnQxhhcqebB1zp8KgTVdw3qdw1qUUPSHSzNe1O72Uq1_PcEHC0BHa2uyqAb6-OK4xRSyUni8gWXynEfP3sd3lJkIpgaPT6p6WQywdPhjXogXwrXo2A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198656370</pqid></control><display><type>article</type><title>A probabilistic measure for alignment-free sequence comparison</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Pham, Tuan D. ; Zuegg, Johannes</creator><creatorcontrib>Pham, Tuan D. ; Zuegg, Johannes</creatorcontrib><description>Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bth426</identifier><identifier>PMID: 15271780</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Biological and medical sciences ; Computer Simulation ; Escherichia coli ; Escherichia coli - genetics ; Fundamental and applied biological sciences. Psychology ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Models, Genetic ; Models, Statistical ; Operon - genetics ; Sequence Alignment - methods ; Sequence Analysis, DNA - methods ; Sequence Homology, Nucleic Acid ; Shigella flexneri ; Shigella flexneri - genetics ; Threonine - genetics</subject><ispartof>Bioinformatics, 2004-12, Vol.20 (18), p.3455-3461</ispartof><rights>2005 INIST-CNRS</rights><rights>Copyright Oxford University Press(England) Dec 12, 2004</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c546t-333e45c7c75916d1b8303c38d2ba5c6723b99d1167442519cf58c2f7b95823013</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16404863$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15271780$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pham, Tuan D.</creatorcontrib><creatorcontrib>Zuegg, Johannes</creatorcontrib><title>A probabilistic measure for alignment-free sequence comparison</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author.</description><subject>Algorithms</subject><subject>Biological and medical sciences</subject><subject>Computer Simulation</subject><subject>Escherichia coli</subject><subject>Escherichia coli - genetics</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Operon - genetics</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Homology, Nucleic Acid</subject><subject>Shigella flexneri</subject><subject>Shigella flexneri - genetics</subject><subject>Threonine - genetics</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkUtLAzEUhYMoWh8_QRkE3Y3mncxGENFWKVioorgJSZrR6DxqMgP67420WHTj6gbudw735ACwj-AJggU5Nb71TdmGWnfexlPTvVDM18AAUQ5zDFmxnt6Ei5xKSLbAdoyvEDJEKd0EW4hhgYSEA3B2ns1Da7TxlY_JKaudjn1wWbLOdOWfm9o1XV4G57Lo3nvXWJfZtp7r4GPb7IKNUlfR7S3nDri_ury7GOXj2-H1xfk4t4zyLieEOMqssIIViM-QkQQSS-QMG80sF5iYopghxAWlmKHClkxaXApTMIkJRGQHHC9807HpiNip2kfrqko3ru2j4ikOS-p_QQwhh5LxBB7-AV_bPjQphEKF5IkQMEFsAdnQxhhcqebB1zp8KgTVdw3qdw1qUUPSHSzNe1O72Uq1_PcEHC0BHa2uyqAb6-OK4xRSyUni8gWXynEfP3sd3lJkIpgaPT6p6WQywdPhjXogXwrXo2A</recordid><startdate>20041212</startdate><enddate>20041212</enddate><creator>Pham, Tuan D.</creator><creator>Zuegg, Johannes</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7QL</scope><scope>C1K</scope><scope>7X8</scope></search><sort><creationdate>20041212</creationdate><title>A probabilistic measure for alignment-free sequence comparison</title><author>Pham, Tuan D. ; Zuegg, Johannes</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c546t-333e45c7c75916d1b8303c38d2ba5c6723b99d1167442519cf58c2f7b95823013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Algorithms</topic><topic>Biological and medical sciences</topic><topic>Computer Simulation</topic><topic>Escherichia coli</topic><topic>Escherichia coli - genetics</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Operon - genetics</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Homology, Nucleic Acid</topic><topic>Shigella flexneri</topic><topic>Shigella flexneri - genetics</topic><topic>Threonine - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pham, Tuan D.</creatorcontrib><creatorcontrib>Zuegg, Johannes</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pham, Tuan D.</au><au>Zuegg, Johannes</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A probabilistic measure for alignment-free sequence comparison</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2004-12-12</date><risdate>2004</risdate><volume>20</volume><issue>18</issue><spage>3455</spage><epage>3461</epage><pages>3455-3461</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>15271780</pmid><doi>10.1093/bioinformatics/bth426</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2004-12, Vol.20 (18), p.3455-3461 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_67175674 |
source | MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection |
subjects | Algorithms Biological and medical sciences Computer Simulation Escherichia coli Escherichia coli - genetics Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Models, Statistical Operon - genetics Sequence Alignment - methods Sequence Analysis, DNA - methods Sequence Homology, Nucleic Acid Shigella flexneri Shigella flexneri - genetics Threonine - genetics |
title | A probabilistic measure for alignment-free sequence comparison |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T14%3A32%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20probabilistic%20measure%20for%20alignment-free%20sequence%20comparison&rft.jtitle=Bioinformatics&rft.au=Pham,%20Tuan%20D.&rft.date=2004-12-12&rft.volume=20&rft.issue=18&rft.spage=3455&rft.epage=3461&rft.pages=3455-3461&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/bth426&rft_dat=%3Cproquest_cross%3E768641541%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198656370&rft_id=info:pmid/15271780&rfr_iscdi=true |