Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment ass...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2003-10, Vol.31 (19), p.5654-5666 |
---|---|
Hauptverfasser: | , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5666 |
---|---|
container_issue | 19 |
container_start_page | 5654 |
container_title | Nucleic acids research |
container_volume | 31 |
creator | Haas, Brian J. Delcher, Arthur L. Mount, Stephen M. Wortman, Jennifer R. Smith Jr, Roger K. Hannick, Linda I. Maiti, Rama Ronning, Catherine M. Rusch, Douglas B. Town, Christopher D. Salzberg, Steven L. White, Owen |
description | The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations. |
doi_str_mv | 10.1093/nar/gkg770 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_206470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73671874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi0EotvChR-AIg4cKoWO46_4wKFUlK1UiQtICxfLyU5St4md2klV_j2udlU-LnDyyPO8o3fmJeQVhXcUNDvxNp70N71S8ISsKJNVybWsnpIVMBAlBV4fkMOUrgEop4I_JweUC4C60ivy7WKcYrhzvi_mKyxOo23cNkzJpaJHH0YsrPdhtrMLvljSAzfaezfaoZij9amNbpoLO7jej-hzlRKOzeAwvSDPOjskfLl_j8jX849fztbl5edPF2enl2UrOJ1Lzbu2Q0q1tAKEhapqoG2UrJTVHW5b2OJWImsQdCdbsPm3xbqTNXYgNGfsiLzfzZ2WZsyC7CLawUwxm4w_TLDO_Nnx7sr04c5UILmCrH-718dwu2CazehSi8NgPYYlGcWkorXi_wTzClKLuv4PkAotQGbwzV_gdViiz9fK5kDUVNQiQ8c7qI0hpYjd42oUzEP-Judvdvln-PXvx_iF7gPPQLkDXJrx_rFv442Riilh1pvvZsMAzj-sN4azn9uKvqI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200581585</pqid></control><display><type>article</type><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><source>MEDLINE</source><source>PubMed (Medline)</source><source>Oxford Journals Open Access Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</creator><creatorcontrib>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</creatorcontrib><description>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkg770</identifier><identifier>PMID: 14500829</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Alternative Splicing ; Arabidopsis - genetics ; Arabidopsis - metabolism ; Arabidopsis thaliana ; DNA, Complementary - analysis ; Expressed Sequence Tags ; Genome, Plant ; Introns ; Plant Proteins - genetics ; RNA, Plant - analysis ; RNA, Plant - chemistry ; Sequence Alignment - methods ; Software ; Transcription, Genetic ; Untranslated Regions</subject><ispartof>Nucleic acids research, 2003-10, Vol.31 (19), p.5654-5666</ispartof><rights>Copyright Oxford University Press(England) Oct 01, 2003</rights><rights>Copyright © 2003 Oxford University Press 2003</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC206470/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC206470/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14500829$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Haas, Brian J.</creatorcontrib><creatorcontrib>Delcher, Arthur L.</creatorcontrib><creatorcontrib>Mount, Stephen M.</creatorcontrib><creatorcontrib>Wortman, Jennifer R.</creatorcontrib><creatorcontrib>Smith Jr, Roger K.</creatorcontrib><creatorcontrib>Hannick, Linda I.</creatorcontrib><creatorcontrib>Maiti, Rama</creatorcontrib><creatorcontrib>Ronning, Catherine M.</creatorcontrib><creatorcontrib>Rusch, Douglas B.</creatorcontrib><creatorcontrib>Town, Christopher D.</creatorcontrib><creatorcontrib>Salzberg, Steven L.</creatorcontrib><creatorcontrib>White, Owen</creatorcontrib><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</description><subject>Algorithms</subject><subject>Alternative Splicing</subject><subject>Arabidopsis - genetics</subject><subject>Arabidopsis - metabolism</subject><subject>Arabidopsis thaliana</subject><subject>DNA, Complementary - analysis</subject><subject>Expressed Sequence Tags</subject><subject>Genome, Plant</subject><subject>Introns</subject><subject>Plant Proteins - genetics</subject><subject>RNA, Plant - analysis</subject><subject>RNA, Plant - chemistry</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><subject>Transcription, Genetic</subject><subject>Untranslated Regions</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU1v1DAQhi0EotvChR-AIg4cKoWO46_4wKFUlK1UiQtICxfLyU5St4md2klV_j2udlU-LnDyyPO8o3fmJeQVhXcUNDvxNp70N71S8ISsKJNVybWsnpIVMBAlBV4fkMOUrgEop4I_JweUC4C60ivy7WKcYrhzvi_mKyxOo23cNkzJpaJHH0YsrPdhtrMLvljSAzfaezfaoZij9amNbpoLO7jej-hzlRKOzeAwvSDPOjskfLl_j8jX849fztbl5edPF2enl2UrOJ1Lzbu2Q0q1tAKEhapqoG2UrJTVHW5b2OJWImsQdCdbsPm3xbqTNXYgNGfsiLzfzZ2WZsyC7CLawUwxm4w_TLDO_Nnx7sr04c5UILmCrH-718dwu2CazehSi8NgPYYlGcWkorXi_wTzClKLuv4PkAotQGbwzV_gdViiz9fK5kDUVNQiQ8c7qI0hpYjd42oUzEP-Judvdvln-PXvx_iF7gPPQLkDXJrx_rFv442Riilh1pvvZsMAzj-sN4azn9uKvqI</recordid><startdate>20031001</startdate><enddate>20031001</enddate><creator>Haas, Brian J.</creator><creator>Delcher, Arthur L.</creator><creator>Mount, Stephen M.</creator><creator>Wortman, Jennifer R.</creator><creator>Smith Jr, Roger K.</creator><creator>Hannick, Linda I.</creator><creator>Maiti, Rama</creator><creator>Ronning, Catherine M.</creator><creator>Rusch, Douglas B.</creator><creator>Town, Christopher D.</creator><creator>Salzberg, Steven L.</creator><creator>White, Owen</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20031001</creationdate><title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</title><author>Haas, Brian J. ; Delcher, Arthur L. ; Mount, Stephen M. ; Wortman, Jennifer R. ; Smith Jr, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L. ; White, Owen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c541t-94fcfe1196a505a022b0cb7627a9fedc0ded6e3be09f6c0aa9fce8f68ef059433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Alternative Splicing</topic><topic>Arabidopsis - genetics</topic><topic>Arabidopsis - metabolism</topic><topic>Arabidopsis thaliana</topic><topic>DNA, Complementary - analysis</topic><topic>Expressed Sequence Tags</topic><topic>Genome, Plant</topic><topic>Introns</topic><topic>Plant Proteins - genetics</topic><topic>RNA, Plant - analysis</topic><topic>RNA, Plant - chemistry</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><topic>Transcription, Genetic</topic><topic>Untranslated Regions</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Haas, Brian J.</creatorcontrib><creatorcontrib>Delcher, Arthur L.</creatorcontrib><creatorcontrib>Mount, Stephen M.</creatorcontrib><creatorcontrib>Wortman, Jennifer R.</creatorcontrib><creatorcontrib>Smith Jr, Roger K.</creatorcontrib><creatorcontrib>Hannick, Linda I.</creatorcontrib><creatorcontrib>Maiti, Rama</creatorcontrib><creatorcontrib>Ronning, Catherine M.</creatorcontrib><creatorcontrib>Rusch, Douglas B.</creatorcontrib><creatorcontrib>Town, Christopher D.</creatorcontrib><creatorcontrib>Salzberg, Steven L.</creatorcontrib><creatorcontrib>White, Owen</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Haas, Brian J.</au><au>Delcher, Arthur L.</au><au>Mount, Stephen M.</au><au>Wortman, Jennifer R.</au><au>Smith Jr, Roger K.</au><au>Hannick, Linda I.</au><au>Maiti, Rama</au><au>Ronning, Catherine M.</au><au>Rusch, Douglas B.</au><au>Town, Christopher D.</au><au>Salzberg, Steven L.</au><au>White, Owen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2003-10-01</date><risdate>2003</risdate><volume>31</volume><issue>19</issue><spage>5654</spage><epage>5666</epage><pages>5654-5666</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>14500829</pmid><doi>10.1093/nar/gkg770</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2003-10, Vol.31 (19), p.5654-5666 |
issn | 0305-1048 1362-4962 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_206470 |
source | MEDLINE; PubMed (Medline); Oxford Journals Open Access Collection; Free Full-Text Journals in Chemistry |
subjects | Algorithms Alternative Splicing Arabidopsis - genetics Arabidopsis - metabolism Arabidopsis thaliana DNA, Complementary - analysis Expressed Sequence Tags Genome, Plant Introns Plant Proteins - genetics RNA, Plant - analysis RNA, Plant - chemistry Sequence Alignment - methods Software Transcription, Genetic Untranslated Regions |
title | Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T21%3A38%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20the%20Arabidopsis%20genome%20annotation%20using%20maximal%20transcript%20alignment%20assemblies&rft.jtitle=Nucleic%20acids%20research&rft.au=Haas,%20Brian%20J.&rft.date=2003-10-01&rft.volume=31&rft.issue=19&rft.spage=5654&rft.epage=5666&rft.pages=5654-5666&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkg770&rft_dat=%3Cproquest_pubme%3E73671874%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200581585&rft_id=info:pmid/14500829&rfr_iscdi=true |