Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2017-03, Vol.45 (5), p.2629-2643 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2643 |
---|---|
container_issue | 5 |
container_start_page | 2629 |
container_title | Nucleic acids research |
container_volume | 45 |
creator | Zhu, Yafeng Engström, Pär G Tellgren-Roth, Christian Baudo, Charles D Kennell, John C Sun, Sheng Billmyre, R Blake Schröder, Markus S Andersson, Anna Holm, Tina Sigurgeirsson, Benjamin Wu, Guangxi Sankaranarayanan, Sundar Ram Siddharthan, Rahul Sanyal, Kaustuv Lundeberg, Joakim Nystedt, Björn Boekhout, Teun Dawson, Jr, Thomas L Heitman, Joseph Scheynius, Annika Lehtiö, Janne |
description | Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. |
doi_str_mv | 10.1093/nar/gkx006 |
format | Article |
fullrecord | <record><control><sourceid>proquest_swepu</sourceid><recordid>TN_cdi_swepub_primary_oai_swepub_ki_se_499668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1861575302</sourcerecordid><originalsourceid>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</originalsourceid><addsrcrecordid>eNqNks-P1CAUx4nRuOPqxT_AcDTGukApLReTzfozWaMH9UoYeG1xWhihXR3_Cv9k6c44cQ8aTzx4Hz7vHb4IPaTkGSWyPPM6nnWb74SIW2hFS8EKLgW7jVakJFVBCW9O0L2UvhBCOa34XXTCGpppKVfo54cYJggd-DA6k_A2BjsbSNiEcRuhB5_cFWDtLe5d1w87rI2Zo55gQSdwvjDBOt_hrFg4HyY9ueCx81hfWwbI8PWA3E8JxnW2hBa_08Ny_eE0Trtxmy16cOk-utPqIcGDw3mKPr16-fHiTXH5_vXbi_PLwnDJpkLUdClASg11S0tCtG14Q1rdSgvMMCHa9Vo0pha0bKtaG8FZYy1YwVtqTXmKir03fYPtvFbb6EYddypopw5Pm1yB4lIK0WT-6V_5F-7zuQqxU_OsSkYYJf-Hp1lRzkRV_3ObI76ZesWoqPiyzfM9n-ERrAE_RT3c-Haz412vunClqrKRgooseHwQxPB1hjSp0SUDw6A9hDkp2gha1VVJWEaf7FETQ0oR2uMYStQSQJUDqPYBzPCjPxc7or8TV_4CTQHerg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1861575302</pqid></control><display><type>article</type><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>SWEPUB Freely available online</source><source>Free Full-Text Journals in Chemistry</source><creator>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</creator><creatorcontrib>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</creatorcontrib><description>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkx006</identifier><identifier>PMID: 28100699</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>allele ; animal cell ; atopic dermatitis ; chromosome 5 ; controlled study ; DNA base composition ; DNA sequence ; functional genomics ; Fungal ; fungal gene ; fungal genome ; fungal protein ; Fungal Proteins ; Fungal Proteins - genetics ; gene locus ; gene mapping ; Genes ; Genes, Fungal ; genetics ; Genome ; Genome, Fungal ; Genome, Mitochondrial ; Genomics ; human ; Malassezia ; Malassezia - genetics ; Malassezia sympodialis ; Mitochondrial ; mitochondrial genome ; molecular genetics ; Molecular Sequence Annotation ; Molecular Sequence Annotation - methods ; nonhuman ; nucleotide sequence ; peptide ; peptide analysis ; Peptides ; Peptides - genetics ; phylogeny ; priority journal ; procedures ; protein domain ; Protein Domains ; proteogenomics ; Proteogenomics - methods ; RNA ; RNA sequence ; sequence analysis ; Sequence Analysis, RNA ; sequence homology ; transcriptomics</subject><ispartof>Nucleic acids research, 2017-03, Vol.45 (5), p.2629-2643</ispartof><rights>The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.</rights><rights>The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</citedby><cites>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</cites><orcidid>0000-0001-5265-2121 ; 0000-0003-1947-9026</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,550,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28100699$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216548$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-142657$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320210$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttp://kipublications.ki.se/Default.aspx?queryparsed=id:135440906$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Yafeng</creatorcontrib><creatorcontrib>Engström, Pär G</creatorcontrib><creatorcontrib>Tellgren-Roth, Christian</creatorcontrib><creatorcontrib>Baudo, Charles D</creatorcontrib><creatorcontrib>Kennell, John C</creatorcontrib><creatorcontrib>Sun, Sheng</creatorcontrib><creatorcontrib>Billmyre, R Blake</creatorcontrib><creatorcontrib>Schröder, Markus S</creatorcontrib><creatorcontrib>Andersson, Anna</creatorcontrib><creatorcontrib>Holm, Tina</creatorcontrib><creatorcontrib>Sigurgeirsson, Benjamin</creatorcontrib><creatorcontrib>Wu, Guangxi</creatorcontrib><creatorcontrib>Sankaranarayanan, Sundar Ram</creatorcontrib><creatorcontrib>Siddharthan, Rahul</creatorcontrib><creatorcontrib>Sanyal, Kaustuv</creatorcontrib><creatorcontrib>Lundeberg, Joakim</creatorcontrib><creatorcontrib>Nystedt, Björn</creatorcontrib><creatorcontrib>Boekhout, Teun</creatorcontrib><creatorcontrib>Dawson, Jr, Thomas L</creatorcontrib><creatorcontrib>Heitman, Joseph</creatorcontrib><creatorcontrib>Scheynius, Annika</creatorcontrib><creatorcontrib>Lehtiö, Janne</creatorcontrib><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</description><subject>allele</subject><subject>animal cell</subject><subject>atopic dermatitis</subject><subject>chromosome 5</subject><subject>controlled study</subject><subject>DNA base composition</subject><subject>DNA sequence</subject><subject>functional genomics</subject><subject>Fungal</subject><subject>fungal gene</subject><subject>fungal genome</subject><subject>fungal protein</subject><subject>Fungal Proteins</subject><subject>Fungal Proteins - genetics</subject><subject>gene locus</subject><subject>gene mapping</subject><subject>Genes</subject><subject>Genes, Fungal</subject><subject>genetics</subject><subject>Genome</subject><subject>Genome, Fungal</subject><subject>Genome, Mitochondrial</subject><subject>Genomics</subject><subject>human</subject><subject>Malassezia</subject><subject>Malassezia - genetics</subject><subject>Malassezia sympodialis</subject><subject>Mitochondrial</subject><subject>mitochondrial genome</subject><subject>molecular genetics</subject><subject>Molecular Sequence Annotation</subject><subject>Molecular Sequence Annotation - methods</subject><subject>nonhuman</subject><subject>nucleotide sequence</subject><subject>peptide</subject><subject>peptide analysis</subject><subject>Peptides</subject><subject>Peptides - genetics</subject><subject>phylogeny</subject><subject>priority journal</subject><subject>procedures</subject><subject>protein domain</subject><subject>Protein Domains</subject><subject>proteogenomics</subject><subject>Proteogenomics - methods</subject><subject>RNA</subject><subject>RNA sequence</subject><subject>sequence analysis</subject><subject>Sequence Analysis, RNA</subject><subject>sequence homology</subject><subject>transcriptomics</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>D8T</sourceid><recordid>eNqNks-P1CAUx4nRuOPqxT_AcDTGukApLReTzfozWaMH9UoYeG1xWhihXR3_Cv9k6c44cQ8aTzx4Hz7vHb4IPaTkGSWyPPM6nnWb74SIW2hFS8EKLgW7jVakJFVBCW9O0L2UvhBCOa34XXTCGpppKVfo54cYJggd-DA6k_A2BjsbSNiEcRuhB5_cFWDtLe5d1w87rI2Zo55gQSdwvjDBOt_hrFg4HyY9ueCx81hfWwbI8PWA3E8JxnW2hBa_08Ny_eE0Trtxmy16cOk-utPqIcGDw3mKPr16-fHiTXH5_vXbi_PLwnDJpkLUdClASg11S0tCtG14Q1rdSgvMMCHa9Vo0pha0bKtaG8FZYy1YwVtqTXmKir03fYPtvFbb6EYddypopw5Pm1yB4lIK0WT-6V_5F-7zuQqxU_OsSkYYJf-Hp1lRzkRV_3ObI76ZesWoqPiyzfM9n-ERrAE_RT3c-Haz412vunClqrKRgooseHwQxPB1hjSp0SUDw6A9hDkp2gha1VVJWEaf7FETQ0oR2uMYStQSQJUDqPYBzPCjPxc7or8TV_4CTQHerg</recordid><startdate>20170317</startdate><enddate>20170317</enddate><creator>Zhu, Yafeng</creator><creator>Engström, Pär G</creator><creator>Tellgren-Roth, Christian</creator><creator>Baudo, Charles D</creator><creator>Kennell, John C</creator><creator>Sun, Sheng</creator><creator>Billmyre, R Blake</creator><creator>Schröder, Markus S</creator><creator>Andersson, Anna</creator><creator>Holm, Tina</creator><creator>Sigurgeirsson, Benjamin</creator><creator>Wu, Guangxi</creator><creator>Sankaranarayanan, Sundar Ram</creator><creator>Siddharthan, Rahul</creator><creator>Sanyal, Kaustuv</creator><creator>Lundeberg, Joakim</creator><creator>Nystedt, Björn</creator><creator>Boekhout, Teun</creator><creator>Dawson, Jr, Thomas L</creator><creator>Heitman, Joseph</creator><creator>Scheynius, Annika</creator><creator>Lehtiö, Janne</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>D8V</scope><scope>DG7</scope><scope>ACNBI</scope><scope>D8T</scope><scope>DF2</scope><scope>ZZAVC</scope><orcidid>https://orcid.org/0000-0001-5265-2121</orcidid><orcidid>https://orcid.org/0000-0003-1947-9026</orcidid></search><sort><creationdate>20170317</creationdate><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><author>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>allele</topic><topic>animal cell</topic><topic>atopic dermatitis</topic><topic>chromosome 5</topic><topic>controlled study</topic><topic>DNA base composition</topic><topic>DNA sequence</topic><topic>functional genomics</topic><topic>Fungal</topic><topic>fungal gene</topic><topic>fungal genome</topic><topic>fungal protein</topic><topic>Fungal Proteins</topic><topic>Fungal Proteins - genetics</topic><topic>gene locus</topic><topic>gene mapping</topic><topic>Genes</topic><topic>Genes, Fungal</topic><topic>genetics</topic><topic>Genome</topic><topic>Genome, Fungal</topic><topic>Genome, Mitochondrial</topic><topic>Genomics</topic><topic>human</topic><topic>Malassezia</topic><topic>Malassezia - genetics</topic><topic>Malassezia sympodialis</topic><topic>Mitochondrial</topic><topic>mitochondrial genome</topic><topic>molecular genetics</topic><topic>Molecular Sequence Annotation</topic><topic>Molecular Sequence Annotation - methods</topic><topic>nonhuman</topic><topic>nucleotide sequence</topic><topic>peptide</topic><topic>peptide analysis</topic><topic>Peptides</topic><topic>Peptides - genetics</topic><topic>phylogeny</topic><topic>priority journal</topic><topic>procedures</topic><topic>protein domain</topic><topic>Protein Domains</topic><topic>proteogenomics</topic><topic>Proteogenomics - methods</topic><topic>RNA</topic><topic>RNA sequence</topic><topic>sequence analysis</topic><topic>Sequence Analysis, RNA</topic><topic>sequence homology</topic><topic>transcriptomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Yafeng</creatorcontrib><creatorcontrib>Engström, Pär G</creatorcontrib><creatorcontrib>Tellgren-Roth, Christian</creatorcontrib><creatorcontrib>Baudo, Charles D</creatorcontrib><creatorcontrib>Kennell, John C</creatorcontrib><creatorcontrib>Sun, Sheng</creatorcontrib><creatorcontrib>Billmyre, R Blake</creatorcontrib><creatorcontrib>Schröder, Markus S</creatorcontrib><creatorcontrib>Andersson, Anna</creatorcontrib><creatorcontrib>Holm, Tina</creatorcontrib><creatorcontrib>Sigurgeirsson, Benjamin</creatorcontrib><creatorcontrib>Wu, Guangxi</creatorcontrib><creatorcontrib>Sankaranarayanan, Sundar Ram</creatorcontrib><creatorcontrib>Siddharthan, Rahul</creatorcontrib><creatorcontrib>Sanyal, Kaustuv</creatorcontrib><creatorcontrib>Lundeberg, Joakim</creatorcontrib><creatorcontrib>Nystedt, Björn</creatorcontrib><creatorcontrib>Boekhout, Teun</creatorcontrib><creatorcontrib>Dawson, Jr, Thomas L</creatorcontrib><creatorcontrib>Heitman, Joseph</creatorcontrib><creatorcontrib>Scheynius, Annika</creatorcontrib><creatorcontrib>Lehtiö, Janne</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Kungliga Tekniska Högskolan</collection><collection>SWEPUB Stockholms universitet</collection><collection>SWEPUB Uppsala universitet full text</collection><collection>SWEPUB Freely available online</collection><collection>SWEPUB Uppsala universitet</collection><collection>SwePub Articles full text</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Yafeng</au><au>Engström, Pär G</au><au>Tellgren-Roth, Christian</au><au>Baudo, Charles D</au><au>Kennell, John C</au><au>Sun, Sheng</au><au>Billmyre, R Blake</au><au>Schröder, Markus S</au><au>Andersson, Anna</au><au>Holm, Tina</au><au>Sigurgeirsson, Benjamin</au><au>Wu, Guangxi</au><au>Sankaranarayanan, Sundar Ram</au><au>Siddharthan, Rahul</au><au>Sanyal, Kaustuv</au><au>Lundeberg, Joakim</au><au>Nystedt, Björn</au><au>Boekhout, Teun</au><au>Dawson, Jr, Thomas L</au><au>Heitman, Joseph</au><au>Scheynius, Annika</au><au>Lehtiö, Janne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2017-03-17</date><risdate>2017</risdate><volume>45</volume><issue>5</issue><spage>2629</spage><epage>2643</epage><pages>2629-2643</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><abstract>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>28100699</pmid><doi>10.1093/nar/gkx006</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-5265-2121</orcidid><orcidid>https://orcid.org/0000-0003-1947-9026</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2017-03, Vol.45 (5), p.2629-2643 |
issn | 0305-1048 1362-4962 1362-4962 |
language | eng |
recordid | cdi_swepub_primary_oai_swepub_ki_se_499668 |
source | Oxford Journals Open Access Collection; MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central; SWEPUB Freely available online; Free Full-Text Journals in Chemistry |
subjects | allele animal cell atopic dermatitis chromosome 5 controlled study DNA base composition DNA sequence functional genomics Fungal fungal gene fungal genome fungal protein Fungal Proteins Fungal Proteins - genetics gene locus gene mapping Genes Genes, Fungal genetics Genome Genome, Fungal Genome, Mitochondrial Genomics human Malassezia Malassezia - genetics Malassezia sympodialis Mitochondrial mitochondrial genome molecular genetics Molecular Sequence Annotation Molecular Sequence Annotation - methods nonhuman nucleotide sequence peptide peptide analysis Peptides Peptides - genetics phylogeny priority journal procedures protein domain Protein Domains proteogenomics Proteogenomics - methods RNA RNA sequence sequence analysis Sequence Analysis, RNA sequence homology transcriptomics |
title | Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T02%3A54%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_swepu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Proteogenomics%20produces%20comprehensive%20and%20highly%20accurate%20protein-coding%20gene%20annotation%20in%20a%20complete%20genome%20assembly%20of%20Malassezia%20sympodialis&rft.jtitle=Nucleic%20acids%20research&rft.au=Zhu,%20Yafeng&rft.date=2017-03-17&rft.volume=45&rft.issue=5&rft.spage=2629&rft.epage=2643&rft.pages=2629-2643&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkx006&rft_dat=%3Cproquest_swepu%3E1861575302%3C/proquest_swepu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1861575302&rft_id=info:pmid/28100699&rfr_iscdi=true |