Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis

Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2017-03, Vol.45 (5), p.2629-2643
Hauptverfasser: Zhu, Yafeng, Engström, Pär G, Tellgren-Roth, Christian, Baudo, Charles D, Kennell, John C, Sun, Sheng, Billmyre, R Blake, Schröder, Markus S, Andersson, Anna, Holm, Tina, Sigurgeirsson, Benjamin, Wu, Guangxi, Sankaranarayanan, Sundar Ram, Siddharthan, Rahul, Sanyal, Kaustuv, Lundeberg, Joakim, Nystedt, Björn, Boekhout, Teun, Dawson, Jr, Thomas L, Heitman, Joseph, Scheynius, Annika, Lehtiö, Janne
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2643
container_issue 5
container_start_page 2629
container_title Nucleic acids research
container_volume 45
creator Zhu, Yafeng
Engström, Pär G
Tellgren-Roth, Christian
Baudo, Charles D
Kennell, John C
Sun, Sheng
Billmyre, R Blake
Schröder, Markus S
Andersson, Anna
Holm, Tina
Sigurgeirsson, Benjamin
Wu, Guangxi
Sankaranarayanan, Sundar Ram
Siddharthan, Rahul
Sanyal, Kaustuv
Lundeberg, Joakim
Nystedt, Björn
Boekhout, Teun
Dawson, Jr, Thomas L
Heitman, Joseph
Scheynius, Annika
Lehtiö, Janne
description Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.
doi_str_mv 10.1093/nar/gkx006
format Article
fullrecord <record><control><sourceid>proquest_swepu</sourceid><recordid>TN_cdi_swepub_primary_oai_swepub_ki_se_499668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1861575302</sourcerecordid><originalsourceid>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</originalsourceid><addsrcrecordid>eNqNks-P1CAUx4nRuOPqxT_AcDTGukApLReTzfozWaMH9UoYeG1xWhihXR3_Cv9k6c44cQ8aTzx4Hz7vHb4IPaTkGSWyPPM6nnWb74SIW2hFS8EKLgW7jVakJFVBCW9O0L2UvhBCOa34XXTCGpppKVfo54cYJggd-DA6k_A2BjsbSNiEcRuhB5_cFWDtLe5d1w87rI2Zo55gQSdwvjDBOt_hrFg4HyY9ueCx81hfWwbI8PWA3E8JxnW2hBa_08Ny_eE0Trtxmy16cOk-utPqIcGDw3mKPr16-fHiTXH5_vXbi_PLwnDJpkLUdClASg11S0tCtG14Q1rdSgvMMCHa9Vo0pha0bKtaG8FZYy1YwVtqTXmKir03fYPtvFbb6EYddypopw5Pm1yB4lIK0WT-6V_5F-7zuQqxU_OsSkYYJf-Hp1lRzkRV_3ObI76ZesWoqPiyzfM9n-ERrAE_RT3c-Haz412vunClqrKRgooseHwQxPB1hjSp0SUDw6A9hDkp2gha1VVJWEaf7FETQ0oR2uMYStQSQJUDqPYBzPCjPxc7or8TV_4CTQHerg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1861575302</pqid></control><display><type>article</type><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>SWEPUB Freely available online</source><source>Free Full-Text Journals in Chemistry</source><creator>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</creator><creatorcontrib>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</creatorcontrib><description>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkx006</identifier><identifier>PMID: 28100699</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>allele ; animal cell ; atopic dermatitis ; chromosome 5 ; controlled study ; DNA base composition ; DNA sequence ; functional genomics ; Fungal ; fungal gene ; fungal genome ; fungal protein ; Fungal Proteins ; Fungal Proteins - genetics ; gene locus ; gene mapping ; Genes ; Genes, Fungal ; genetics ; Genome ; Genome, Fungal ; Genome, Mitochondrial ; Genomics ; human ; Malassezia ; Malassezia - genetics ; Malassezia sympodialis ; Mitochondrial ; mitochondrial genome ; molecular genetics ; Molecular Sequence Annotation ; Molecular Sequence Annotation - methods ; nonhuman ; nucleotide sequence ; peptide ; peptide analysis ; Peptides ; Peptides - genetics ; phylogeny ; priority journal ; procedures ; protein domain ; Protein Domains ; proteogenomics ; Proteogenomics - methods ; RNA ; RNA sequence ; sequence analysis ; Sequence Analysis, RNA ; sequence homology ; transcriptomics</subject><ispartof>Nucleic acids research, 2017-03, Vol.45 (5), p.2629-2643</ispartof><rights>The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.</rights><rights>The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</citedby><cites>FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</cites><orcidid>0000-0001-5265-2121 ; 0000-0003-1947-9026</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389616/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,550,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28100699$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216548$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-142657$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320210$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttp://kipublications.ki.se/Default.aspx?queryparsed=id:135440906$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhu, Yafeng</creatorcontrib><creatorcontrib>Engström, Pär G</creatorcontrib><creatorcontrib>Tellgren-Roth, Christian</creatorcontrib><creatorcontrib>Baudo, Charles D</creatorcontrib><creatorcontrib>Kennell, John C</creatorcontrib><creatorcontrib>Sun, Sheng</creatorcontrib><creatorcontrib>Billmyre, R Blake</creatorcontrib><creatorcontrib>Schröder, Markus S</creatorcontrib><creatorcontrib>Andersson, Anna</creatorcontrib><creatorcontrib>Holm, Tina</creatorcontrib><creatorcontrib>Sigurgeirsson, Benjamin</creatorcontrib><creatorcontrib>Wu, Guangxi</creatorcontrib><creatorcontrib>Sankaranarayanan, Sundar Ram</creatorcontrib><creatorcontrib>Siddharthan, Rahul</creatorcontrib><creatorcontrib>Sanyal, Kaustuv</creatorcontrib><creatorcontrib>Lundeberg, Joakim</creatorcontrib><creatorcontrib>Nystedt, Björn</creatorcontrib><creatorcontrib>Boekhout, Teun</creatorcontrib><creatorcontrib>Dawson, Jr, Thomas L</creatorcontrib><creatorcontrib>Heitman, Joseph</creatorcontrib><creatorcontrib>Scheynius, Annika</creatorcontrib><creatorcontrib>Lehtiö, Janne</creatorcontrib><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</description><subject>allele</subject><subject>animal cell</subject><subject>atopic dermatitis</subject><subject>chromosome 5</subject><subject>controlled study</subject><subject>DNA base composition</subject><subject>DNA sequence</subject><subject>functional genomics</subject><subject>Fungal</subject><subject>fungal gene</subject><subject>fungal genome</subject><subject>fungal protein</subject><subject>Fungal Proteins</subject><subject>Fungal Proteins - genetics</subject><subject>gene locus</subject><subject>gene mapping</subject><subject>Genes</subject><subject>Genes, Fungal</subject><subject>genetics</subject><subject>Genome</subject><subject>Genome, Fungal</subject><subject>Genome, Mitochondrial</subject><subject>Genomics</subject><subject>human</subject><subject>Malassezia</subject><subject>Malassezia - genetics</subject><subject>Malassezia sympodialis</subject><subject>Mitochondrial</subject><subject>mitochondrial genome</subject><subject>molecular genetics</subject><subject>Molecular Sequence Annotation</subject><subject>Molecular Sequence Annotation - methods</subject><subject>nonhuman</subject><subject>nucleotide sequence</subject><subject>peptide</subject><subject>peptide analysis</subject><subject>Peptides</subject><subject>Peptides - genetics</subject><subject>phylogeny</subject><subject>priority journal</subject><subject>procedures</subject><subject>protein domain</subject><subject>Protein Domains</subject><subject>proteogenomics</subject><subject>Proteogenomics - methods</subject><subject>RNA</subject><subject>RNA sequence</subject><subject>sequence analysis</subject><subject>Sequence Analysis, RNA</subject><subject>sequence homology</subject><subject>transcriptomics</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>D8T</sourceid><recordid>eNqNks-P1CAUx4nRuOPqxT_AcDTGukApLReTzfozWaMH9UoYeG1xWhihXR3_Cv9k6c44cQ8aTzx4Hz7vHb4IPaTkGSWyPPM6nnWb74SIW2hFS8EKLgW7jVakJFVBCW9O0L2UvhBCOa34XXTCGpppKVfo54cYJggd-DA6k_A2BjsbSNiEcRuhB5_cFWDtLe5d1w87rI2Zo55gQSdwvjDBOt_hrFg4HyY9ueCx81hfWwbI8PWA3E8JxnW2hBa_08Ny_eE0Trtxmy16cOk-utPqIcGDw3mKPr16-fHiTXH5_vXbi_PLwnDJpkLUdClASg11S0tCtG14Q1rdSgvMMCHa9Vo0pha0bKtaG8FZYy1YwVtqTXmKir03fYPtvFbb6EYddypopw5Pm1yB4lIK0WT-6V_5F-7zuQqxU_OsSkYYJf-Hp1lRzkRV_3ObI76ZesWoqPiyzfM9n-ERrAE_RT3c-Haz412vunClqrKRgooseHwQxPB1hjSp0SUDw6A9hDkp2gha1VVJWEaf7FETQ0oR2uMYStQSQJUDqPYBzPCjPxc7or8TV_4CTQHerg</recordid><startdate>20170317</startdate><enddate>20170317</enddate><creator>Zhu, Yafeng</creator><creator>Engström, Pär G</creator><creator>Tellgren-Roth, Christian</creator><creator>Baudo, Charles D</creator><creator>Kennell, John C</creator><creator>Sun, Sheng</creator><creator>Billmyre, R Blake</creator><creator>Schröder, Markus S</creator><creator>Andersson, Anna</creator><creator>Holm, Tina</creator><creator>Sigurgeirsson, Benjamin</creator><creator>Wu, Guangxi</creator><creator>Sankaranarayanan, Sundar Ram</creator><creator>Siddharthan, Rahul</creator><creator>Sanyal, Kaustuv</creator><creator>Lundeberg, Joakim</creator><creator>Nystedt, Björn</creator><creator>Boekhout, Teun</creator><creator>Dawson, Jr, Thomas L</creator><creator>Heitman, Joseph</creator><creator>Scheynius, Annika</creator><creator>Lehtiö, Janne</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>D8V</scope><scope>DG7</scope><scope>ACNBI</scope><scope>D8T</scope><scope>DF2</scope><scope>ZZAVC</scope><orcidid>https://orcid.org/0000-0001-5265-2121</orcidid><orcidid>https://orcid.org/0000-0003-1947-9026</orcidid></search><sort><creationdate>20170317</creationdate><title>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</title><author>Zhu, Yafeng ; Engström, Pär G ; Tellgren-Roth, Christian ; Baudo, Charles D ; Kennell, John C ; Sun, Sheng ; Billmyre, R Blake ; Schröder, Markus S ; Andersson, Anna ; Holm, Tina ; Sigurgeirsson, Benjamin ; Wu, Guangxi ; Sankaranarayanan, Sundar Ram ; Siddharthan, Rahul ; Sanyal, Kaustuv ; Lundeberg, Joakim ; Nystedt, Björn ; Boekhout, Teun ; Dawson, Jr, Thomas L ; Heitman, Joseph ; Scheynius, Annika ; Lehtiö, Janne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c492t-671c492e99ae7f1300ad8480faf9de2c266fbb68c7613f57ac6428dded64f1dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>allele</topic><topic>animal cell</topic><topic>atopic dermatitis</topic><topic>chromosome 5</topic><topic>controlled study</topic><topic>DNA base composition</topic><topic>DNA sequence</topic><topic>functional genomics</topic><topic>Fungal</topic><topic>fungal gene</topic><topic>fungal genome</topic><topic>fungal protein</topic><topic>Fungal Proteins</topic><topic>Fungal Proteins - genetics</topic><topic>gene locus</topic><topic>gene mapping</topic><topic>Genes</topic><topic>Genes, Fungal</topic><topic>genetics</topic><topic>Genome</topic><topic>Genome, Fungal</topic><topic>Genome, Mitochondrial</topic><topic>Genomics</topic><topic>human</topic><topic>Malassezia</topic><topic>Malassezia - genetics</topic><topic>Malassezia sympodialis</topic><topic>Mitochondrial</topic><topic>mitochondrial genome</topic><topic>molecular genetics</topic><topic>Molecular Sequence Annotation</topic><topic>Molecular Sequence Annotation - methods</topic><topic>nonhuman</topic><topic>nucleotide sequence</topic><topic>peptide</topic><topic>peptide analysis</topic><topic>Peptides</topic><topic>Peptides - genetics</topic><topic>phylogeny</topic><topic>priority journal</topic><topic>procedures</topic><topic>protein domain</topic><topic>Protein Domains</topic><topic>proteogenomics</topic><topic>Proteogenomics - methods</topic><topic>RNA</topic><topic>RNA sequence</topic><topic>sequence analysis</topic><topic>Sequence Analysis, RNA</topic><topic>sequence homology</topic><topic>transcriptomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Yafeng</creatorcontrib><creatorcontrib>Engström, Pär G</creatorcontrib><creatorcontrib>Tellgren-Roth, Christian</creatorcontrib><creatorcontrib>Baudo, Charles D</creatorcontrib><creatorcontrib>Kennell, John C</creatorcontrib><creatorcontrib>Sun, Sheng</creatorcontrib><creatorcontrib>Billmyre, R Blake</creatorcontrib><creatorcontrib>Schröder, Markus S</creatorcontrib><creatorcontrib>Andersson, Anna</creatorcontrib><creatorcontrib>Holm, Tina</creatorcontrib><creatorcontrib>Sigurgeirsson, Benjamin</creatorcontrib><creatorcontrib>Wu, Guangxi</creatorcontrib><creatorcontrib>Sankaranarayanan, Sundar Ram</creatorcontrib><creatorcontrib>Siddharthan, Rahul</creatorcontrib><creatorcontrib>Sanyal, Kaustuv</creatorcontrib><creatorcontrib>Lundeberg, Joakim</creatorcontrib><creatorcontrib>Nystedt, Björn</creatorcontrib><creatorcontrib>Boekhout, Teun</creatorcontrib><creatorcontrib>Dawson, Jr, Thomas L</creatorcontrib><creatorcontrib>Heitman, Joseph</creatorcontrib><creatorcontrib>Scheynius, Annika</creatorcontrib><creatorcontrib>Lehtiö, Janne</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Kungliga Tekniska Högskolan</collection><collection>SWEPUB Stockholms universitet</collection><collection>SWEPUB Uppsala universitet full text</collection><collection>SWEPUB Freely available online</collection><collection>SWEPUB Uppsala universitet</collection><collection>SwePub Articles full text</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Yafeng</au><au>Engström, Pär G</au><au>Tellgren-Roth, Christian</au><au>Baudo, Charles D</au><au>Kennell, John C</au><au>Sun, Sheng</au><au>Billmyre, R Blake</au><au>Schröder, Markus S</au><au>Andersson, Anna</au><au>Holm, Tina</au><au>Sigurgeirsson, Benjamin</au><au>Wu, Guangxi</au><au>Sankaranarayanan, Sundar Ram</au><au>Siddharthan, Rahul</au><au>Sanyal, Kaustuv</au><au>Lundeberg, Joakim</au><au>Nystedt, Björn</au><au>Boekhout, Teun</au><au>Dawson, Jr, Thomas L</au><au>Heitman, Joseph</au><au>Scheynius, Annika</au><au>Lehtiö, Janne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2017-03-17</date><risdate>2017</risdate><volume>45</volume><issue>5</issue><spage>2629</spage><epage>2643</epage><pages>2629-2643</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><abstract>Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>28100699</pmid><doi>10.1093/nar/gkx006</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-5265-2121</orcidid><orcidid>https://orcid.org/0000-0003-1947-9026</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2017-03, Vol.45 (5), p.2629-2643
issn 0305-1048
1362-4962
1362-4962
language eng
recordid cdi_swepub_primary_oai_swepub_ki_se_499668
source Oxford Journals Open Access Collection; MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central; SWEPUB Freely available online; Free Full-Text Journals in Chemistry
subjects allele
animal cell
atopic dermatitis
chromosome 5
controlled study
DNA base composition
DNA sequence
functional genomics
Fungal
fungal gene
fungal genome
fungal protein
Fungal Proteins
Fungal Proteins - genetics
gene locus
gene mapping
Genes
Genes, Fungal
genetics
Genome
Genome, Fungal
Genome, Mitochondrial
Genomics
human
Malassezia
Malassezia - genetics
Malassezia sympodialis
Mitochondrial
mitochondrial genome
molecular genetics
Molecular Sequence Annotation
Molecular Sequence Annotation - methods
nonhuman
nucleotide sequence
peptide
peptide analysis
Peptides
Peptides - genetics
phylogeny
priority journal
procedures
protein domain
Protein Domains
proteogenomics
Proteogenomics - methods
RNA
RNA sequence
sequence analysis
Sequence Analysis, RNA
sequence homology
transcriptomics
title Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T02%3A54%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_swepu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Proteogenomics%20produces%20comprehensive%20and%20highly%20accurate%20protein-coding%20gene%20annotation%20in%20a%20complete%20genome%20assembly%20of%20Malassezia%20sympodialis&rft.jtitle=Nucleic%20acids%20research&rft.au=Zhu,%20Yafeng&rft.date=2017-03-17&rft.volume=45&rft.issue=5&rft.spage=2629&rft.epage=2643&rft.pages=2629-2643&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkx006&rft_dat=%3Cproquest_swepu%3E1861575302%3C/proquest_swepu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1861575302&rft_id=info:pmid/28100699&rfr_iscdi=true