OrthoMCL: identification of ortholog groups for eukaryotic genomes

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome research 2003-09, Vol.13 (9), p.2178-2189
Hauptverfasser: Li, Li, Stoeckert, Jr, Christian J, Roos, David S
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2189
container_issue 9
container_start_page 2178
container_title Genome research
container_volume 13
creator Li, Li
Stoeckert, Jr, Christian J
Roos, David S
description The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.
doi_str_mv 10.1101/gr.1224503
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_403725</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>18827748</sourcerecordid><originalsourceid>FETCH-LOGICAL-c511t-7878f67d07bc9a4cc4f2fdb7288bd26972534943ac61b1ca3e1f5556c096ae5a3</originalsourceid><addsrcrecordid>eNqFkTtPwzAUhT2AaCks_ACUiQEpxc_YQWKAipdU1AVmy3Hs1JDExU6Q-PekasRjYrrD-e7RJx0AThCcIwTRRRXmCGPKINkDUwSFSHPI0AQcxvgKISRUiAMwQThnWAg2BTer0K3902J5mbjStJ2zTqvO-TbxNvHbrPZVUgXfb2JifUhM_6bCp--cTirT-sbEI7BvVR3N8Xhn4OXu9nnxkC5X94-L62WqGUJdygUXNuMl5IXOFdWaWmzLgg8eRYmznGNGaE6J0hkqkFbEIMsYyzTMM2WYIjNwtevd9EVjSj3YBlXLTXDNYCS9cvJv0rq1rPyHpJBsy2fgbPwP_r03sZONi9rUtWqN76PkJMMEcv4viITAnFMxgOc7UAcfYzD2WwZBuZ1DVkGOcwzw6W_9H3TcgnwBWw6Iww</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>18827748</pqid></control><display><type>article</type><title>OrthoMCL: identification of ortholog groups for eukaryotic genomes</title><source>MEDLINE</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Li, Li ; Stoeckert, Jr, Christian J ; Roos, David S</creator><creatorcontrib>Li, Li ; Stoeckert, Jr, Christian J ; Roos, David S</creatorcontrib><description>The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.</description><identifier>ISSN: 1088-9051</identifier><identifier>ISSN: 1054-9803</identifier><identifier>DOI: 10.1101/gr.1224503</identifier><identifier>PMID: 12952885</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Animals ; Arabidopsis - genetics ; Caenorhabditis elegans - genetics ; Computational Biology - methods ; Drosophila melanogaster - genetics ; Eukaryotic Cells - chemistry ; Eukaryotic Cells - metabolism ; Genome ; Genome, Fungal ; Genome, Plant ; Genome, Protozoan ; Humans ; Internet ; Methods ; Plasmodium falciparum - genetics ; Saccharomyces cerevisiae - genetics ; Sequence Homology, Nucleic Acid ; Software</subject><ispartof>Genome research, 2003-09, Vol.13 (9), p.2178-2189</ispartof><rights>Copyright © 2003, Cold Spring Harbor Laboratory Press 2003</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c511t-7878f67d07bc9a4cc4f2fdb7288bd26972534943ac61b1ca3e1f5556c096ae5a3</citedby><cites>FETCH-LOGICAL-c511t-7878f67d07bc9a4cc4f2fdb7288bd26972534943ac61b1ca3e1f5556c096ae5a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC403725/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC403725/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12952885$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Stoeckert, Jr, Christian J</creatorcontrib><creatorcontrib>Roos, David S</creatorcontrib><title>OrthoMCL: identification of ortholog groups for eukaryotic genomes</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.</description><subject>Animals</subject><subject>Arabidopsis - genetics</subject><subject>Caenorhabditis elegans - genetics</subject><subject>Computational Biology - methods</subject><subject>Drosophila melanogaster - genetics</subject><subject>Eukaryotic Cells - chemistry</subject><subject>Eukaryotic Cells - metabolism</subject><subject>Genome</subject><subject>Genome, Fungal</subject><subject>Genome, Plant</subject><subject>Genome, Protozoan</subject><subject>Humans</subject><subject>Internet</subject><subject>Methods</subject><subject>Plasmodium falciparum - genetics</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Sequence Homology, Nucleic Acid</subject><subject>Software</subject><issn>1088-9051</issn><issn>1054-9803</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkTtPwzAUhT2AaCks_ACUiQEpxc_YQWKAipdU1AVmy3Hs1JDExU6Q-PekasRjYrrD-e7RJx0AThCcIwTRRRXmCGPKINkDUwSFSHPI0AQcxvgKISRUiAMwQThnWAg2BTer0K3902J5mbjStJ2zTqvO-TbxNvHbrPZVUgXfb2JifUhM_6bCp--cTirT-sbEI7BvVR3N8Xhn4OXu9nnxkC5X94-L62WqGUJdygUXNuMl5IXOFdWaWmzLgg8eRYmznGNGaE6J0hkqkFbEIMsYyzTMM2WYIjNwtevd9EVjSj3YBlXLTXDNYCS9cvJv0rq1rPyHpJBsy2fgbPwP_r03sZONi9rUtWqN76PkJMMEcv4viITAnFMxgOc7UAcfYzD2WwZBuZ1DVkGOcwzw6W_9H3TcgnwBWw6Iww</recordid><startdate>200309</startdate><enddate>200309</enddate><creator>Li, Li</creator><creator>Stoeckert, Jr, Christian J</creator><creator>Roos, David S</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>200309</creationdate><title>OrthoMCL: identification of ortholog groups for eukaryotic genomes</title><author>Li, Li ; Stoeckert, Jr, Christian J ; Roos, David S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c511t-7878f67d07bc9a4cc4f2fdb7288bd26972534943ac61b1ca3e1f5556c096ae5a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Animals</topic><topic>Arabidopsis - genetics</topic><topic>Caenorhabditis elegans - genetics</topic><topic>Computational Biology - methods</topic><topic>Drosophila melanogaster - genetics</topic><topic>Eukaryotic Cells - chemistry</topic><topic>Eukaryotic Cells - metabolism</topic><topic>Genome</topic><topic>Genome, Fungal</topic><topic>Genome, Plant</topic><topic>Genome, Protozoan</topic><topic>Humans</topic><topic>Internet</topic><topic>Methods</topic><topic>Plasmodium falciparum - genetics</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Sequence Homology, Nucleic Acid</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Stoeckert, Jr, Christian J</creatorcontrib><creatorcontrib>Roos, David S</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Li</au><au>Stoeckert, Jr, Christian J</au><au>Roos, David S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OrthoMCL: identification of ortholog groups for eukaryotic genomes</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2003-09</date><risdate>2003</risdate><volume>13</volume><issue>9</issue><spage>2178</spage><epage>2189</epage><pages>2178-2189</pages><issn>1088-9051</issn><issn>1054-9803</issn><abstract>The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>12952885</pmid><doi>10.1101/gr.1224503</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1088-9051
ispartof Genome research, 2003-09, Vol.13 (9), p.2178-2189
issn 1088-9051
1054-9803
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_403725
source MEDLINE; PubMed Central; Alma/SFX Local Collection
subjects Animals
Arabidopsis - genetics
Caenorhabditis elegans - genetics
Computational Biology - methods
Drosophila melanogaster - genetics
Eukaryotic Cells - chemistry
Eukaryotic Cells - metabolism
Genome
Genome, Fungal
Genome, Plant
Genome, Protozoan
Humans
Internet
Methods
Plasmodium falciparum - genetics
Saccharomyces cerevisiae - genetics
Sequence Homology, Nucleic Acid
Software
title OrthoMCL: identification of ortholog groups for eukaryotic genomes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T00%3A06%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OrthoMCL:%20identification%20of%20ortholog%20groups%20for%20eukaryotic%20genomes&rft.jtitle=Genome%20research&rft.au=Li,%20Li&rft.date=2003-09&rft.volume=13&rft.issue=9&rft.spage=2178&rft.epage=2189&rft.pages=2178-2189&rft.issn=1088-9051&rft_id=info:doi/10.1101/gr.1224503&rft_dat=%3Cproquest_pubme%3E18827748%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=18827748&rft_id=info:pmid/12952885&rfr_iscdi=true