Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data

Abstract We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNA...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:G3 : genes - genomes - genetics 2015-08, Vol.5 (8), p.1721-1736
Hauptverfasser: Matthews, Beverley B, dos Santos, Gilberto, Crosby, Madeline A, Emmert, David B, St. Pierre, Susan E, Gramates, L Sian, Zhou, Pinglei, Schroeder, Andrew J, Falls, Kathleen, Strelets, Victor, Russo, Susan M, Gelbart, William M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1736
container_issue 8
container_start_page 1721
container_title G3 : genes - genomes - genetics
container_volume 5
creator Matthews, Beverley B
dos Santos, Gilberto
Crosby, Madeline A
Emmert, David B
St. Pierre, Susan E
Gramates, L Sian
Zhou, Pinglei
Schroeder, Andrew J
Falls, Kathleen
Strelets, Victor
Russo, Susan M
Gelbart, William M
description Abstract We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
doi_str_mv 10.1534/g3.115.018929
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4528329</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1534/g3.115.018929</oup_id><sourcerecordid>1702650703</sourcerecordid><originalsourceid>FETCH-LOGICAL-c486t-66c596a4d7f94d5a0e510ba3db0cb777c4b92e17bd0de3427b046ce5f692df383</originalsourceid><addsrcrecordid>eNqFUU1LxDAQDaK4snr0Kjl66ZrvtB6Exa8VVgTRc0jT9EPapiap4L-3srrqybnMwDzem3kPgGOMFphTdlbRBcZ8gXCakWwHHBAsUIJTKnZ_zTNwFMILmopzIZjYBzMiMMoolwfg8db2Ft67wrZw2fcu6ti4PsDSeXjlXXBD3bQadrbVvat0iNafw7tu0CZCV8JVU9XJU-3dWNXDGOGVjvoQ7JW6Dfboq8_B88310-UqWT_c3l0u14lhqYiJEIZnQrNClhkruEaWY5RrWuTI5FJKw_KMWCzzAhWWMiJzxISxvBQZKUqa0jm42PAOY97Zwtg-et2qwTed9u_K6Ub93fRNrSr3phgnKSXZRHD6ReDd62hDVF0TjG2nV60bg8ISEcGRRHSCJhuomTwJ3pZbGYzUZxSqomqKQm2imPAnv2_bor-N_9F24_AP1wfwoZEo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1702650703</pqid></control><display><type>article</type><title>Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>TestCollectionTL3OpenAccess</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Matthews, Beverley B ; dos Santos, Gilberto ; Crosby, Madeline A ; Emmert, David B ; St. Pierre, Susan E ; Gramates, L Sian ; Zhou, Pinglei ; Schroeder, Andrew J ; Falls, Kathleen ; Strelets, Victor ; Russo, Susan M ; Gelbart, William M</creator><creatorcontrib>Matthews, Beverley B ; dos Santos, Gilberto ; Crosby, Madeline A ; Emmert, David B ; St. Pierre, Susan E ; Gramates, L Sian ; Zhou, Pinglei ; Schroeder, Andrew J ; Falls, Kathleen ; Strelets, Victor ; Russo, Susan M ; Gelbart, William M ; FlyBase Consortium ; the FlyBase Consortium</creatorcontrib><description>Abstract We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.</description><identifier>ISSN: 2160-1836</identifier><identifier>EISSN: 2160-1836</identifier><identifier>DOI: 10.1534/g3.115.018929</identifier><identifier>PMID: 26109357</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>3' Untranslated Regions ; Animals ; Databases, Genetic ; Drosophila melanogaster - genetics ; Exons ; Female ; Investigations ; Male ; Models, Genetic ; Molecular Sequence Annotation ; RNA, Small Untranslated - chemistry ; RNA, Small Untranslated - metabolism ; Sequence Analysis, RNA ; Transcription Initiation Site ; Transcriptome</subject><ispartof>G3 : genes - genomes - genetics, 2015-08, Vol.5 (8), p.1721-1736</ispartof><rights>2015 Matthews et al. 2015</rights><rights>Copyright © 2015 Matthews et al.</rights><rights>Copyright © 2015 Matthews 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c486t-66c596a4d7f94d5a0e510ba3db0cb777c4b92e17bd0de3427b046ce5f692df383</citedby><cites>FETCH-LOGICAL-c486t-66c596a4d7f94d5a0e510ba3db0cb777c4b92e17bd0de3427b046ce5f692df383</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528329/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528329/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26109357$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Matthews, Beverley B</creatorcontrib><creatorcontrib>dos Santos, Gilberto</creatorcontrib><creatorcontrib>Crosby, Madeline A</creatorcontrib><creatorcontrib>Emmert, David B</creatorcontrib><creatorcontrib>St. Pierre, Susan E</creatorcontrib><creatorcontrib>Gramates, L Sian</creatorcontrib><creatorcontrib>Zhou, Pinglei</creatorcontrib><creatorcontrib>Schroeder, Andrew J</creatorcontrib><creatorcontrib>Falls, Kathleen</creatorcontrib><creatorcontrib>Strelets, Victor</creatorcontrib><creatorcontrib>Russo, Susan M</creatorcontrib><creatorcontrib>Gelbart, William M</creatorcontrib><creatorcontrib>FlyBase Consortium</creatorcontrib><creatorcontrib>the FlyBase Consortium</creatorcontrib><title>Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data</title><title>G3 : genes - genomes - genetics</title><addtitle>G3 (Bethesda)</addtitle><description>Abstract We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.</description><subject>3' Untranslated Regions</subject><subject>Animals</subject><subject>Databases, Genetic</subject><subject>Drosophila melanogaster - genetics</subject><subject>Exons</subject><subject>Female</subject><subject>Investigations</subject><subject>Male</subject><subject>Models, Genetic</subject><subject>Molecular Sequence Annotation</subject><subject>RNA, Small Untranslated - chemistry</subject><subject>RNA, Small Untranslated - metabolism</subject><subject>Sequence Analysis, RNA</subject><subject>Transcription Initiation Site</subject><subject>Transcriptome</subject><issn>2160-1836</issn><issn>2160-1836</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFUU1LxDAQDaK4snr0Kjl66ZrvtB6Exa8VVgTRc0jT9EPapiap4L-3srrqybnMwDzem3kPgGOMFphTdlbRBcZ8gXCakWwHHBAsUIJTKnZ_zTNwFMILmopzIZjYBzMiMMoolwfg8db2Ft67wrZw2fcu6ti4PsDSeXjlXXBD3bQadrbVvat0iNafw7tu0CZCV8JVU9XJU-3dWNXDGOGVjvoQ7JW6Dfboq8_B88310-UqWT_c3l0u14lhqYiJEIZnQrNClhkruEaWY5RrWuTI5FJKw_KMWCzzAhWWMiJzxISxvBQZKUqa0jm42PAOY97Zwtg-et2qwTed9u_K6Ub93fRNrSr3phgnKSXZRHD6ReDd62hDVF0TjG2nV60bg8ISEcGRRHSCJhuomTwJ3pZbGYzUZxSqomqKQm2imPAnv2_bor-N_9F24_AP1wfwoZEo</recordid><startdate>20150801</startdate><enddate>20150801</enddate><creator>Matthews, Beverley B</creator><creator>dos Santos, Gilberto</creator><creator>Crosby, Madeline A</creator><creator>Emmert, David B</creator><creator>St. Pierre, Susan E</creator><creator>Gramates, L Sian</creator><creator>Zhou, Pinglei</creator><creator>Schroeder, Andrew J</creator><creator>Falls, Kathleen</creator><creator>Strelets, Victor</creator><creator>Russo, Susan M</creator><creator>Gelbart, William M</creator><general>Oxford University Press</general><general>Genetics Society of America</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20150801</creationdate><title>Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data</title><author>Matthews, Beverley B ; dos Santos, Gilberto ; Crosby, Madeline A ; Emmert, David B ; St. Pierre, Susan E ; Gramates, L Sian ; Zhou, Pinglei ; Schroeder, Andrew J ; Falls, Kathleen ; Strelets, Victor ; Russo, Susan M ; Gelbart, William M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c486t-66c596a4d7f94d5a0e510ba3db0cb777c4b92e17bd0de3427b046ce5f692df383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>3' Untranslated Regions</topic><topic>Animals</topic><topic>Databases, Genetic</topic><topic>Drosophila melanogaster - genetics</topic><topic>Exons</topic><topic>Female</topic><topic>Investigations</topic><topic>Male</topic><topic>Models, Genetic</topic><topic>Molecular Sequence Annotation</topic><topic>RNA, Small Untranslated - chemistry</topic><topic>RNA, Small Untranslated - metabolism</topic><topic>Sequence Analysis, RNA</topic><topic>Transcription Initiation Site</topic><topic>Transcriptome</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Matthews, Beverley B</creatorcontrib><creatorcontrib>dos Santos, Gilberto</creatorcontrib><creatorcontrib>Crosby, Madeline A</creatorcontrib><creatorcontrib>Emmert, David B</creatorcontrib><creatorcontrib>St. Pierre, Susan E</creatorcontrib><creatorcontrib>Gramates, L Sian</creatorcontrib><creatorcontrib>Zhou, Pinglei</creatorcontrib><creatorcontrib>Schroeder, Andrew J</creatorcontrib><creatorcontrib>Falls, Kathleen</creatorcontrib><creatorcontrib>Strelets, Victor</creatorcontrib><creatorcontrib>Russo, Susan M</creatorcontrib><creatorcontrib>Gelbart, William M</creatorcontrib><creatorcontrib>FlyBase Consortium</creatorcontrib><creatorcontrib>the FlyBase Consortium</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>G3 : genes - genomes - genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matthews, Beverley B</au><au>dos Santos, Gilberto</au><au>Crosby, Madeline A</au><au>Emmert, David B</au><au>St. Pierre, Susan E</au><au>Gramates, L Sian</au><au>Zhou, Pinglei</au><au>Schroeder, Andrew J</au><au>Falls, Kathleen</au><au>Strelets, Victor</au><au>Russo, Susan M</au><au>Gelbart, William M</au><aucorp>FlyBase Consortium</aucorp><aucorp>the FlyBase Consortium</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data</atitle><jtitle>G3 : genes - genomes - genetics</jtitle><addtitle>G3 (Bethesda)</addtitle><date>2015-08-01</date><risdate>2015</risdate><volume>5</volume><issue>8</issue><spage>1721</spage><epage>1736</epage><pages>1721-1736</pages><issn>2160-1836</issn><eissn>2160-1836</eissn><abstract>Abstract We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>26109357</pmid><doi>10.1534/g3.115.018929</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2160-1836
ispartof G3 : genes - genomes - genetics, 2015-08, Vol.5 (8), p.1721-1736
issn 2160-1836
2160-1836
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4528329
source Oxford Journals Open Access Collection; MEDLINE; TestCollectionTL3OpenAccess; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects 3' Untranslated Regions
Animals
Databases, Genetic
Drosophila melanogaster - genetics
Exons
Female
Investigations
Male
Models, Genetic
Molecular Sequence Annotation
RNA, Small Untranslated - chemistry
RNA, Small Untranslated - metabolism
Sequence Analysis, RNA
Transcription Initiation Site
Transcriptome
title Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T17%3A51%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gene%20Model%20Annotations%20for%20Drosophila%20melanogaster:%20Impact%20of%20High-Throughput%20Data&rft.jtitle=G3%20:%20genes%20-%20genomes%20-%20genetics&rft.au=Matthews,%20Beverley%20B&rft.aucorp=FlyBase%20Consortium&rft.date=2015-08-01&rft.volume=5&rft.issue=8&rft.spage=1721&rft.epage=1736&rft.pages=1721-1736&rft.issn=2160-1836&rft.eissn=2160-1836&rft_id=info:doi/10.1534/g3.115.018929&rft_dat=%3Cproquest_pubme%3E1702650703%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1702650703&rft_id=info:pmid/26109357&rft_oup_id=10.1534/g3.115.018929&rfr_iscdi=true