Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment ass...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2003-10, Vol.31 (19), p.5654-5666
Hauptverfasser: Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith Jr, Roger K., Hannick, Linda I., Maiti, Rama, Ronning, Catherine M., Rusch, Douglas B., Town, Christopher D., Salzberg, Steven L., White, Owen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5666
container_issue 19
container_start_page 5654
container_title Nucleic acids research
container_volume 31
creator Haas, Brian J.
Delcher, Arthur L.
Mount, Stephen M.
Wortman, Jennifer R.
Smith Jr, Roger K.
Hannick, Linda I.
Maiti, Rama
Ronning, Catherine M.
Rusch, Douglas B.
Town, Christopher D.
Salzberg, Steven L.
White, Owen
description The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.
doi_str_mv 10.1093/nar/gkg770
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_206470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73671874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi0EotvChR-AIg4cKoWO46_4wKFUlK1UiQtICxfLyU5St4md2klV_j2udlU-LnDyyPO8o3fmJeQVhXcUNDvxNp70N71S8ISsKJNVybWsnpIVMBAlBV4fkMOUrgEop4I_JweUC4C60ivy7WKcYrhzvi_mKyxOo23cNkzJpaJHH0YsrPdhtrMLvljSAzfaezfaoZij9amNbpoLO7jej-hzlRKOzeAwvSDPOjskfLl_j8jX849fztbl5edPF2enl2UrOJ1Lzbu2Q0q1tAKEhapqoG2UrJTVHW5b2OJWImsQdCdbsPm3xbqTNXYgNGfsiLzfzZ2WZsyC7CLawUwxm4w_TLDO_Nnx7sr04c5UILmCrH-718dwu2CazehSi8NgPYYlGcWkorXi_wTzClKLuv4PkAotQGbwzV_gdViiz9fK5kDUVNQiQ8c7qI0hpYjd42oUzEP-Judvdvln-PXvx_iF7gPPQLkDXJrx_rFv442Riilh1pvvZsMAzj-sN4azn9uKvqI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200581585</pqid></control><display><type>article</type><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><source>MEDLINE</source><source>PubMed (Medline)</source><source>Oxford Journals Open Access Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</creator><creatorcontrib>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</creatorcontrib><description>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and &gt;1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkg770</identifier><identifier>PMID: 14500829</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Alternative Splicing ; Arabidopsis - genetics ; Arabidopsis - metabolism ; Arabidopsis thaliana ; DNA, Complementary - analysis ; Expressed Sequence Tags ; Genome, Plant ; Introns ; Plant Proteins - genetics ; RNA, Plant - analysis ; RNA, Plant - chemistry ; Sequence Alignment - methods ; Software ; Transcription, Genetic ; Untranslated Regions</subject><ispartof>Nucleic acids research, 2003-10, Vol.31 (19), p.5654-5666</ispartof><rights>Copyright Oxford University Press(England) Oct 01, 2003</rights><rights>Copyright © 2003 Oxford University Press 2003</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC206470/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC206470/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14500829$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Haas, Brian J.</creatorcontrib><creatorcontrib>Delcher, Arthur L.</creatorcontrib><creatorcontrib>Mount, Stephen M.</creatorcontrib><creatorcontrib>Wortman, Jennifer R.</creatorcontrib><creatorcontrib>Smith Jr, Roger K.</creatorcontrib><creatorcontrib>Hannick, Linda I.</creatorcontrib><creatorcontrib>Maiti, Rama</creatorcontrib><creatorcontrib>Ronning, Catherine M.</creatorcontrib><creatorcontrib>Rusch, Douglas B.</creatorcontrib><creatorcontrib>Town, Christopher D.</creatorcontrib><creatorcontrib>Salzberg, Steven L.</creatorcontrib><creatorcontrib>White, Owen</creatorcontrib><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and &gt;1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</description><subject>Algorithms</subject><subject>Alternative Splicing</subject><subject>Arabidopsis - genetics</subject><subject>Arabidopsis - metabolism</subject><subject>Arabidopsis thaliana</subject><subject>DNA, Complementary - analysis</subject><subject>Expressed Sequence Tags</subject><subject>Genome, Plant</subject><subject>Introns</subject><subject>Plant Proteins - genetics</subject><subject>RNA, Plant - analysis</subject><subject>RNA, Plant - chemistry</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><subject>Transcription, Genetic</subject><subject>Untranslated Regions</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU1v1DAQhi0EotvChR-AIg4cKoWO46_4wKFUlK1UiQtICxfLyU5St4md2klV_j2udlU-LnDyyPO8o3fmJeQVhXcUNDvxNp70N71S8ISsKJNVybWsnpIVMBAlBV4fkMOUrgEop4I_JweUC4C60ivy7WKcYrhzvi_mKyxOo23cNkzJpaJHH0YsrPdhtrMLvljSAzfaezfaoZij9amNbpoLO7jej-hzlRKOzeAwvSDPOjskfLl_j8jX849fztbl5edPF2enl2UrOJ1Lzbu2Q0q1tAKEhapqoG2UrJTVHW5b2OJWImsQdCdbsPm3xbqTNXYgNGfsiLzfzZ2WZsyC7CLawUwxm4w_TLDO_Nnx7sr04c5UILmCrH-718dwu2CazehSi8NgPYYlGcWkorXi_wTzClKLuv4PkAotQGbwzV_gdViiz9fK5kDUVNQiQ8c7qI0hpYjd42oUzEP-Judvdvln-PXvx_iF7gPPQLkDXJrx_rFv442Riilh1pvvZsMAzj-sN4azn9uKvqI</recordid><startdate>20031001</startdate><enddate>20031001</enddate><creator>Haas, Brian J.</creator><creator>Delcher, Arthur L.</creator><creator>Mount, Stephen M.</creator><creator>Wortman, Jennifer R.</creator><creator>Smith Jr, Roger K.</creator><creator>Hannick, Linda I.</creator><creator>Maiti, Rama</creator><creator>Ronning, Catherine M.</creator><creator>Rusch, Douglas B.</creator><creator>Town, Christopher D.</creator><creator>Salzberg, Steven L.</creator><creator>White, Owen</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20031001</creationdate><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><author>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Alternative Splicing</topic><topic>Arabidopsis - genetics</topic><topic>Arabidopsis - metabolism</topic><topic>Arabidopsis thaliana</topic><topic>DNA, Complementary - analysis</topic><topic>Expressed Sequence Tags</topic><topic>Genome, Plant</topic><topic>Introns</topic><topic>Plant Proteins - genetics</topic><topic>RNA, Plant - analysis</topic><topic>RNA, Plant - chemistry</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><topic>Transcription, Genetic</topic><topic>Untranslated Regions</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Haas, Brian J.</creatorcontrib><creatorcontrib>Delcher, Arthur L.</creatorcontrib><creatorcontrib>Mount, Stephen M.</creatorcontrib><creatorcontrib>Wortman, Jennifer R.</creatorcontrib><creatorcontrib>Smith Jr, Roger K.</creatorcontrib><creatorcontrib>Hannick, Linda I.</creatorcontrib><creatorcontrib>Maiti, Rama</creatorcontrib><creatorcontrib>Ronning, Catherine M.</creatorcontrib><creatorcontrib>Rusch, Douglas B.</creatorcontrib><creatorcontrib>Town, Christopher D.</creatorcontrib><creatorcontrib>Salzberg, Steven L.</creatorcontrib><creatorcontrib>White, Owen</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Haas, Brian J.</au><au>Delcher, Arthur L.</au><au>Mount, Stephen M.</au><au>Wortman, Jennifer R.</au><au>Smith Jr, Roger K.</au><au>Hannick, Linda I.</au><au>Maiti, Rama</au><au>Ronning, Catherine M.</au><au>Rusch, Douglas B.</au><au>Town, Christopher D.</au><au>Salzberg, Steven L.</au><au>White, Owen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2003-10-01</date><risdate>2003</risdate><volume>31</volume><issue>19</issue><spage>5654</spage><epage>5666</epage><pages>5654-5666</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and &gt;1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>14500829</pmid><doi>10.1093/nar/gkg770</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2003-10, Vol.31 (19), p.5654-5666
issn 0305-1048
1362-4962
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_206470
source MEDLINE; PubMed (Medline); Oxford Journals Open Access Collection; Free Full-Text Journals in Chemistry
subjects Algorithms
Alternative Splicing
Arabidopsis - genetics
Arabidopsis - metabolism
Arabidopsis thaliana
DNA, Complementary - analysis
Expressed Sequence Tags
Genome, Plant
Introns
Plant Proteins - genetics
RNA, Plant - analysis
RNA, Plant - chemistry
Sequence Alignment - methods
Software
Transcription, Genetic
Untranslated Regions
title Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T21%3A38%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20the%20Arabidopsis%20genome%20annotation%20using%20maximal%20transcript%20alignment%20assemblies&rft.jtitle=Nucleic%20acids%20research&rft.au=Haas,%20Brian%20J.&rft.date=2003-10-01&rft.volume=31&rft.issue=19&rft.spage=5654&rft.epage=5666&rft.pages=5654-5666&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkg770&rft_dat=%3Cproquest_pubme%3E73671874%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200581585&rft_id=info:pmid/14500829&rfr_iscdi=true