ESTprep: preprocessing cDNA sequence reads
Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing....
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2003-07, Vol.19 (11), p.1318-1324 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1324 |
---|---|
container_issue | 11 |
container_start_page | 1318 |
container_title | Bioinformatics |
container_volume | 19 |
creator | Scheetz, Todd E. Trivedi, Nishank Roberts, Chad A. Kucaba, Tamara Berger, Brian Robinson, Natalie L. Birkett, Clayton L. Gavin, Allen J. O’Leary, Brian Braun, Terry A. Bonaldo, Maria F. Robinson, John P. Sheffield, Val C. Soares, Marcelo B. Casavant, Thomas L. |
description | Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed. |
doi_str_mv | 10.1093/bioinformatics/btg159 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_73486158</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73486158</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</originalsourceid><addsrcrecordid>eNqFkE1Lw0AQhhdRbK3-BKUIehBi93s33kRbqxQVrFi8LJvNRKJNUndT0H9vaotFL15mBuaZeWdehPYJPiU4Zr0kr_Iyq3xh69yFXlK_EBFvoDbhEkcUi3izqZlUEdeYtdBOCK8YC8I530YtQrXimNM2Ouk_jGceZmfdRfSVgxDy8qXrLm_PuwHe51A66HqwadhFW5mdBthb5Q56HPTHF8NodHd1fXE-ilwjVUcqkykjVIG0AEwznUgHHIhKpCSpSlOWxTSVzMVYZhQTZamymoFNME95HLMOOl7ubc5p9ENtijw4mE5tCdU8GMW4lkTof0GiNRVC0wY8_AO-VnNfNk8YEmspBKYLWbGEnK9C8JCZmc8L6z8NwWZhufltuVla3swdrJbPkwLS9dTK4wY4WgE2ODvNvC1dHtacoJzrby5acnmo4eOnb_2bkYopYYaTZ3M_wRMsB0_mhn0BvL6cLw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198655029</pqid></control><display><type>article</type><title>ESTprep: preprocessing cDNA sequence reads</title><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><source>Oxford Open Access Journals</source><creator>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</creator><creatorcontrib>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</creatorcontrib><description>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btg159</identifier><identifier>PMID: 12874042</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Base Sequence ; Biological and medical sciences ; DNA, Complementary - chemistry ; DNA, Complementary - genetics ; Expressed Sequence Tags ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Molecular Sequence Data ; Quality Control ; Sequence Alignment - methods ; Sequence Analysis, DNA - methods ; Software</subject><ispartof>Bioinformatics, 2003-07, Vol.19 (11), p.1318-1324</ispartof><rights>Copyright Oxford University Press(England) Jul 22, 2003</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15244842$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12874042$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Scheetz, Todd E.</creatorcontrib><creatorcontrib>Trivedi, Nishank</creatorcontrib><creatorcontrib>Roberts, Chad A.</creatorcontrib><creatorcontrib>Kucaba, Tamara</creatorcontrib><creatorcontrib>Berger, Brian</creatorcontrib><creatorcontrib>Robinson, Natalie L.</creatorcontrib><creatorcontrib>Birkett, Clayton L.</creatorcontrib><creatorcontrib>Gavin, Allen J.</creatorcontrib><creatorcontrib>O’Leary, Brian</creatorcontrib><creatorcontrib>Braun, Terry A.</creatorcontrib><creatorcontrib>Bonaldo, Maria F.</creatorcontrib><creatorcontrib>Robinson, John P.</creatorcontrib><creatorcontrib>Sheffield, Val C.</creatorcontrib><creatorcontrib>Soares, Marcelo B.</creatorcontrib><creatorcontrib>Casavant, Thomas L.</creatorcontrib><title>ESTprep: preprocessing cDNA sequence reads</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Biological and medical sciences</subject><subject>DNA, Complementary - chemistry</subject><subject>DNA, Complementary - genetics</subject><subject>Expressed Sequence Tags</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Molecular Sequence Data</subject><subject>Quality Control</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkE1Lw0AQhhdRbK3-BKUIehBi93s33kRbqxQVrFi8LJvNRKJNUndT0H9vaotFL15mBuaZeWdehPYJPiU4Zr0kr_Iyq3xh69yFXlK_EBFvoDbhEkcUi3izqZlUEdeYtdBOCK8YC8I530YtQrXimNM2Ouk_jGceZmfdRfSVgxDy8qXrLm_PuwHe51A66HqwadhFW5mdBthb5Q56HPTHF8NodHd1fXE-ilwjVUcqkykjVIG0AEwznUgHHIhKpCSpSlOWxTSVzMVYZhQTZamymoFNME95HLMOOl7ubc5p9ENtijw4mE5tCdU8GMW4lkTof0GiNRVC0wY8_AO-VnNfNk8YEmspBKYLWbGEnK9C8JCZmc8L6z8NwWZhufltuVla3swdrJbPkwLS9dTK4wY4WgE2ODvNvC1dHtacoJzrby5acnmo4eOnb_2bkYopYYaTZ3M_wRMsB0_mhn0BvL6cLw</recordid><startdate>20030722</startdate><enddate>20030722</enddate><creator>Scheetz, Todd E.</creator><creator>Trivedi, Nishank</creator><creator>Roberts, Chad A.</creator><creator>Kucaba, Tamara</creator><creator>Berger, Brian</creator><creator>Robinson, Natalie L.</creator><creator>Birkett, Clayton L.</creator><creator>Gavin, Allen J.</creator><creator>O’Leary, Brian</creator><creator>Braun, Terry A.</creator><creator>Bonaldo, Maria F.</creator><creator>Robinson, John P.</creator><creator>Sheffield, Val C.</creator><creator>Soares, Marcelo B.</creator><creator>Casavant, Thomas L.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20030722</creationdate><title>ESTprep: preprocessing cDNA sequence reads</title><author>Scheetz, Todd E. ; Trivedi, Nishank ; Roberts, Chad A. ; Kucaba, Tamara ; Berger, Brian ; Robinson, Natalie L. ; Birkett, Clayton L. ; Gavin, Allen J. ; O’Leary, Brian ; Braun, Terry A. ; Bonaldo, Maria F. ; Robinson, John P. ; Sheffield, Val C. ; Soares, Marcelo B. ; Casavant, Thomas L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-7f6d3127e6aee3838b6ce4e17b661d7dd3f92d63c906f2017a27a83eab04d4993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Biological and medical sciences</topic><topic>DNA, Complementary - chemistry</topic><topic>DNA, Complementary - genetics</topic><topic>Expressed Sequence Tags</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Molecular Sequence Data</topic><topic>Quality Control</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Scheetz, Todd E.</creatorcontrib><creatorcontrib>Trivedi, Nishank</creatorcontrib><creatorcontrib>Roberts, Chad A.</creatorcontrib><creatorcontrib>Kucaba, Tamara</creatorcontrib><creatorcontrib>Berger, Brian</creatorcontrib><creatorcontrib>Robinson, Natalie L.</creatorcontrib><creatorcontrib>Birkett, Clayton L.</creatorcontrib><creatorcontrib>Gavin, Allen J.</creatorcontrib><creatorcontrib>O’Leary, Brian</creatorcontrib><creatorcontrib>Braun, Terry A.</creatorcontrib><creatorcontrib>Bonaldo, Maria F.</creatorcontrib><creatorcontrib>Robinson, John P.</creatorcontrib><creatorcontrib>Sheffield, Val C.</creatorcontrib><creatorcontrib>Soares, Marcelo B.</creatorcontrib><creatorcontrib>Casavant, Thomas L.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Scheetz, Todd E.</au><au>Trivedi, Nishank</au><au>Roberts, Chad A.</au><au>Kucaba, Tamara</au><au>Berger, Brian</au><au>Robinson, Natalie L.</au><au>Birkett, Clayton L.</au><au>Gavin, Allen J.</au><au>O’Leary, Brian</au><au>Braun, Terry A.</au><au>Bonaldo, Maria F.</au><au>Robinson, John P.</au><au>Sheffield, Val C.</au><au>Soares, Marcelo B.</au><au>Casavant, Thomas L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ESTprep: preprocessing cDNA sequence reads</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2003-07-22</date><risdate>2003</risdate><volume>19</volume><issue>11</issue><spage>1318</spage><epage>1324</epage><pages>1318-1324</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html Contact: tscheetz@eng.uiowa.edu * To whom correspondence should be addressed.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>12874042</pmid><doi>10.1093/bioinformatics/btg159</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2003-07, Vol.19 (11), p.1318-1324 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_73486158 |
source | MEDLINE; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection; Oxford Open Access Journals |
subjects | Algorithms Base Sequence Biological and medical sciences DNA, Complementary - chemistry DNA, Complementary - genetics Expressed Sequence Tags Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Quality Control Sequence Alignment - methods Sequence Analysis, DNA - methods Software |
title | ESTprep: preprocessing cDNA sequence reads |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A00%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ESTprep:%20preprocessing%20cDNA%20sequence%20reads&rft.jtitle=Bioinformatics&rft.au=Scheetz,%20Todd%20E.&rft.date=2003-07-22&rft.volume=19&rft.issue=11&rft.spage=1318&rft.epage=1324&rft.pages=1318-1324&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btg159&rft_dat=%3Cproquest_cross%3E73486158%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198655029&rft_id=info:pmid/12874042&rfr_iscdi=true |