ESTprep: preprocessing cDNA sequence reads

Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2003-07, Vol.19 (11), p.1318-1324
Hauptverfasser: Scheetz, Todd E., Trivedi, Nishank, Roberts, Chad A., Kucaba, Tamara, Berger, Brian, Robinson, Natalie L., Birkett, Clayton L., Gavin, Allen J., O’Leary, Brian, Braun, Terry A., Bonaldo, Maria F., Robinson, John P., Sheffield, Val C., Soares, Marcelo B., Casavant, Thomas L.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1324
container_issue 11
container_start_page 1318
container_title Bioinformatics
container_volume 19
creator Scheetz, Todd E.
Trivedi, Nishank
Roberts, Chad A.
Kucaba, Tamara
Berger, Brian
Robinson, Natalie L.
Birkett, Clayton L.
Gavin, Allen J.
O’Leary, Brian
Braun, Terry A.
Bonaldo, Maria F.
Robinson, John P.
Sheffield, Val C.
Soares, Marcelo B.
Casavant, Thomas L.
description Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.
doi_str_mv 10.1093/bioinformatics/btg159
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_73486158</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73486158</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</originalsourceid><addsrcrecordid>eNqFkE1Lw0AQhhdRbK3-BKUIehBi93s33kRbqxQVrFi8LJvNRKJNUndT0H9vaotFL15mBuaZeWdehPYJPiU4Zr0kr_Iyq3xh69yFXlK_EBFvoDbhEkcUi3izqZlUEdeYtdBOCK8YC8I530YtQrXimNM2Ouk_jGceZmfdRfSVgxDy8qXrLm_PuwHe51A66HqwadhFW5mdBthb5Q56HPTHF8NodHd1fXE-ilwjVUcqkykjVIG0AEwznUgHHIhKpCSpSlOWxTSVzMVYZhQTZamymoFNME95HLMOOl7ubc5p9ENtijw4mE5tCdU8GMW4lkTof0GiNRVC0wY8_AO-VnNfNk8YEmspBKYLWbGEnK9C8JCZmc8L6z8NwWZhufltuVla3swdrJbPkwLS9dTK4wY4WgE2ODvNvC1dHtacoJzrby5acnmo4eOnb_2bkYopYYaTZ3M_wRMsB0_mhn0BvL6cLw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198655029</pqid></control><display><type>article</type><title>ESTprep: preprocessing cDNA sequence reads</title><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><source>Oxford Open Access Journals</source><creator>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</creator><creatorcontrib>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</creatorcontrib><description>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btg159</identifier><identifier>PMID: 12874042</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Base Sequence ; Biological and medical sciences ; DNA, Complementary - chemistry ; DNA, Complementary - genetics ; Expressed Sequence Tags ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Molecular Sequence Data ; Quality Control ; Sequence Alignment - methods ; Sequence Analysis, DNA - methods ; Software</subject><ispartof>Bioinformatics, 2003-07, Vol.19 (11), p.1318-1324</ispartof><rights>Copyright Oxford University Press(England) Jul 22, 2003</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=15244842$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12874042$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Scheetz, Todd E.</creatorcontrib><creatorcontrib>Trivedi, Nishank</creatorcontrib><creatorcontrib>Roberts, Chad A.</creatorcontrib><creatorcontrib>Kucaba, Tamara</creatorcontrib><creatorcontrib>Berger, Brian</creatorcontrib><creatorcontrib>Robinson, Natalie L.</creatorcontrib><creatorcontrib>Birkett, Clayton L.</creatorcontrib><creatorcontrib>Gavin, Allen J.</creatorcontrib><creatorcontrib>O’Leary, Brian</creatorcontrib><creatorcontrib>Braun, Terry A.</creatorcontrib><creatorcontrib>Bonaldo, Maria F.</creatorcontrib><creatorcontrib>Robinson, John P.</creatorcontrib><creatorcontrib>Sheffield, Val C.</creatorcontrib><creatorcontrib>Soares, Marcelo B.</creatorcontrib><creatorcontrib>Casavant, Thomas L.</creatorcontrib><title>ESTprep: preprocessing cDNA sequence reads</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Biological and medical sciences</subject><subject>DNA, Complementary - chemistry</subject><subject>DNA, Complementary - genetics</subject><subject>Expressed Sequence Tags</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Molecular Sequence Data</subject><subject>Quality Control</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkE1Lw0AQhhdRbK3-BKUIehBi93s33kRbqxQVrFi8LJvNRKJNUndT0H9vaotFL15mBuaZeWdehPYJPiU4Zr0kr_Iyq3xh69yFXlK_EBFvoDbhEkcUi3izqZlUEdeYtdBOCK8YC8I530YtQrXimNM2Ouk_jGceZmfdRfSVgxDy8qXrLm_PuwHe51A66HqwadhFW5mdBthb5Q56HPTHF8NodHd1fXE-ilwjVUcqkykjVIG0AEwznUgHHIhKpCSpSlOWxTSVzMVYZhQTZamymoFNME95HLMOOl7ubc5p9ENtijw4mE5tCdU8GMW4lkTof0GiNRVC0wY8_AO-VnNfNk8YEmspBKYLWbGEnK9C8JCZmc8L6z8NwWZhufltuVla3swdrJbPkwLS9dTK4wY4WgE2ODvNvC1dHtacoJzrby5acnmo4eOnb_2bkYopYYaTZ3M_wRMsB0_mhn0BvL6cLw</recordid><startdate>20030722</startdate><enddate>20030722</enddate><creator>Scheetz, Todd E.</creator><creator>Trivedi, Nishank</creator><creator>Roberts, Chad A.</creator><creator>Kucaba, Tamara</creator><creator>Berger, Brian</creator><creator>Robinson, Natalie L.</creator><creator>Birkett, Clayton L.</creator><creator>Gavin, Allen J.</creator><creator>O’Leary, Brian</creator><creator>Braun, Terry A.</creator><creator>Bonaldo, Maria F.</creator><creator>Robinson, John P.</creator><creator>Sheffield, Val C.</creator><creator>Soares, Marcelo B.</creator><creator>Casavant, Thomas L.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20030722</creationdate><title>ESTprep: preprocessing cDNA sequence reads</title><author>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Biological and medical sciences</topic><topic>DNA, Complementary - chemistry</topic><topic>DNA, Complementary - genetics</topic><topic>Expressed Sequence Tags</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Molecular Sequence Data</topic><topic>Quality Control</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Scheetz, Todd E.</creatorcontrib><creatorcontrib>Trivedi, Nishank</creatorcontrib><creatorcontrib>Roberts, Chad A.</creatorcontrib><creatorcontrib>Kucaba, Tamara</creatorcontrib><creatorcontrib>Berger, Brian</creatorcontrib><creatorcontrib>Robinson, Natalie L.</creatorcontrib><creatorcontrib>Birkett, Clayton L.</creatorcontrib><creatorcontrib>Gavin, Allen J.</creatorcontrib><creatorcontrib>O’Leary, Brian</creatorcontrib><creatorcontrib>Braun, Terry A.</creatorcontrib><creatorcontrib>Bonaldo, Maria F.</creatorcontrib><creatorcontrib>Robinson, John P.</creatorcontrib><creatorcontrib>Sheffield, Val C.</creatorcontrib><creatorcontrib>Soares, Marcelo B.</creatorcontrib><creatorcontrib>Casavant, Thomas L.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Scheetz, Todd E.</au><au>Trivedi, Nishank</au><au>Roberts, Chad A.</au><au>Kucaba, Tamara</au><au>Berger, Brian</au><au>Robinson, Natalie L.</au><au>Birkett, Clayton L.</au><au>Gavin, Allen J.</au><au>O’Leary, Brian</au><au>Braun, Terry A.</au><au>Bonaldo, Maria F.</au><au>Robinson, John P.</au><au>Sheffield, Val C.</au><au>Soares, Marcelo B.</au><au>Casavant, Thomas L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESTprep: preprocessing cDNA sequence reads</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2003-07-22</date><risdate>2003</risdate><volume>19</volume><issue>11</issue><spage>1318</spage><epage>1324</epage><pages>1318-1324</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>12874042</pmid><doi>10.1093/bioinformatics/btg159</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2003-07, Vol.19 (11), p.1318-1324
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_73486158
source MEDLINE; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection; Oxford Open Access Journals
subjects Algorithms
Base Sequence
Biological and medical sciences
DNA, Complementary - chemistry
DNA, Complementary - genetics
Expressed Sequence Tags
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling - methods
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Molecular Sequence Data
Quality Control
Sequence Alignment - methods
Sequence Analysis, DNA - methods
Software
title ESTprep: preprocessing cDNA sequence reads
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A00%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESTprep:%20preprocessing%20cDNA%20sequence%20reads&rft.jtitle=Bioinformatics&rft.au=Scheetz,%20Todd%20E.&rft.date=2003-07-22&rft.volume=19&rft.issue=11&rft.spage=1318&rft.epage=1324&rft.pages=1318-1324&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btg159&rft_dat=%3Cproquest_cross%3E73486158%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198655029&rft_id=info:pmid/12874042&rfr_iscdi=true