An ACGT-Words Tree for Efficient Data Access in Genomic Databases

Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Wo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ye-In Chang, Wei-Horng Yeh, Jiun-Rung Chen, Jen-Wei Hu
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Bioinformatics Computational biology Computational intelligence Data structures DNA Genomics Indexing Sequences Tree data structures
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	150
container_issue
container_start_page	143
container_title
container_volume
creator	Ye-In Chang Wei-Horng Yeh Jiun-Rung Chen Jen-Wei Hu
description	Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Words tree, for efficiently support query processing in genomic databases. We define the concept of words which is different from the word definition given in the word suffix tree, and separate the DNA sequences stored in the database and in the query sequence into distinct words. Our approach does not store all of the suffixes in the database sequences. Therefore, we need less space than the suffix tree approach. We also propose an efficient search algorithm to do the sequence match based on the ACGT-Words tree index structure. Therefore, we could take less time to finish the search than the suffix array approach. Moreover, our approach avoids the missing cases occurring in the word suffix tree. The simulation results show that our ACGT-Words tree outperforms the suffix tree and the suffix array in terms of storage and processing time, respectively
doi_str_mv	10.1109/CIBCB.2007.4221216
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4221216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4221216</ieee_id><sourcerecordid>4221216</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-ed5ac75d4a8a5e69fa0cf781c3cb20628d3b50340a77eb720497f486dbf80a903</originalsourceid><addsrcrecordid>eNotj81Kw0AUhQdEqNa-gG7mBRLv_GUmy2mssVBwE-my3EzuwIhNJJONb2_Rns2B78AHh7FHAaUQUD83-22zLSWALbWUQorqht0LLbUGe9lXbJPzJ1yijXKVvGPej9w3bVccp3nIvJuJeJxmvosxhUTjwl9wQe5DoJx5GnlL43RO4Q_3mCk_sNuIX5k2116zj9dd17wVh_d23_hDkYQ1S0GDwWDNoNGhoaqOCCFaJ4IKvYRKukH1BpQGtJZ6K0HXNmpXDX10gDWoNXv69yYiOn3P6Yzzz-n6Uv0CFFdGQw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>An ACGT-Words Tree for Efficient Data Access in Genomic Databases</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ye-In Chang ; Wei-Horng Yeh ; Jiun-Rung Chen ; Jen-Wei Hu</creator><creatorcontrib>Ye-In Chang ; Wei-Horng Yeh ; Jiun-Rung Chen ; Jen-Wei Hu</creatorcontrib><description>Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Words tree, for efficiently support query processing in genomic databases. We define the concept of words which is different from the word definition given in the word suffix tree, and separate the DNA sequences stored in the database and in the query sequence into distinct words. Our approach does not store all of the suffixes in the database sequences. Therefore, we need less space than the suffix tree approach. We also propose an efficient search algorithm to do the sequence match based on the ACGT-Words tree index structure. Therefore, we could take less time to finish the search than the suffix array approach. Moreover, our approach avoids the missing cases occurring in the word suffix tree. The simulation results show that our ACGT-Words tree outperforms the suffix tree and the suffix array in terms of storage and processing time, respectively</description><identifier>ISBN: 1424407109</identifier><identifier>ISBN: 9781424407101</identifier><identifier>DOI: 10.1109/CIBCB.2007.4221216</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bioinformatics ; Computational biology ; Computational intelligence ; Data structures ; DNA ; Genomics ; Indexing ; Sequences ; Tree data structures</subject><ispartof>2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2007, p.143-150</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4221216$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4221216$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ye-In Chang</creatorcontrib><creatorcontrib>Wei-Horng Yeh</creatorcontrib><creatorcontrib>Jiun-Rung Chen</creatorcontrib><creatorcontrib>Jen-Wei Hu</creatorcontrib><title>An ACGT-Words Tree for Efficient Data Access in Genomic Databases</title><title>2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology</title><addtitle>CIBCB</addtitle><description>Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Words tree, for efficiently support query processing in genomic databases. We define the concept of words which is different from the word definition given in the word suffix tree, and separate the DNA sequences stored in the database and in the query sequence into distinct words. Our approach does not store all of the suffixes in the database sequences. Therefore, we need less space than the suffix tree approach. We also propose an efficient search algorithm to do the sequence match based on the ACGT-Words tree index structure. Therefore, we could take less time to finish the search than the suffix array approach. Moreover, our approach avoids the missing cases occurring in the word suffix tree. The simulation results show that our ACGT-Words tree outperforms the suffix tree and the suffix array in terms of storage and processing time, respectively</description><subject>Bioinformatics</subject><subject>Computational biology</subject><subject>Computational intelligence</subject><subject>Data structures</subject><subject>DNA</subject><subject>Genomics</subject><subject>Indexing</subject><subject>Sequences</subject><subject>Tree data structures</subject><isbn>1424407109</isbn><isbn>9781424407101</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2007</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj81Kw0AUhQdEqNa-gG7mBRLv_GUmy2mssVBwE-my3EzuwIhNJJONb2_Rns2B78AHh7FHAaUQUD83-22zLSWALbWUQorqht0LLbUGe9lXbJPzJ1yijXKVvGPej9w3bVccp3nIvJuJeJxmvosxhUTjwl9wQe5DoJx5GnlL43RO4Q_3mCk_sNuIX5k2116zj9dd17wVh_d23_hDkYQ1S0GDwWDNoNGhoaqOCCFaJ4IKvYRKukH1BpQGtJZ6K0HXNmpXDX10gDWoNXv69yYiOn3P6Yzzz-n6Uv0CFFdGQw</recordid><startdate>200704</startdate><enddate>200704</enddate><creator>Ye-In Chang</creator><creator>Wei-Horng Yeh</creator><creator>Jiun-Rung Chen</creator><creator>Jen-Wei Hu</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200704</creationdate><title>An ACGT-Words Tree for Efficient Data Access in Genomic Databases</title><author>Ye-In Chang ; Wei-Horng Yeh ; Jiun-Rung Chen ; Jen-Wei Hu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-ed5ac75d4a8a5e69fa0cf781c3cb20628d3b50340a77eb720497f486dbf80a903</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Bioinformatics</topic><topic>Computational biology</topic><topic>Computational intelligence</topic><topic>Data structures</topic><topic>DNA</topic><topic>Genomics</topic><topic>Indexing</topic><topic>Sequences</topic><topic>Tree data structures</topic><toplevel>online_resources</toplevel><creatorcontrib>Ye-In Chang</creatorcontrib><creatorcontrib>Wei-Horng Yeh</creatorcontrib><creatorcontrib>Jiun-Rung Chen</creatorcontrib><creatorcontrib>Jen-Wei Hu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ye-In Chang</au><au>Wei-Horng Yeh</au><au>Jiun-Rung Chen</au><au>Jen-Wei Hu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>An ACGT-Words Tree for Efficient Data Access in Genomic Databases</atitle><btitle>2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology</btitle><stitle>CIBCB</stitle><date>2007-04</date><risdate>2007</risdate><spage>143</spage><epage>150</epage><pages>143-150</pages><isbn>1424407109</isbn><isbn>9781424407101</isbn><abstract>Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Words tree, for efficiently support query processing in genomic databases. We define the concept of words which is different from the word definition given in the word suffix tree, and separate the DNA sequences stored in the database and in the query sequence into distinct words. Our approach does not store all of the suffixes in the database sequences. Therefore, we need less space than the suffix tree approach. We also propose an efficient search algorithm to do the sequence match based on the ACGT-Words tree index structure. Therefore, we could take less time to finish the search than the suffix array approach. Moreover, our approach avoids the missing cases occurring in the word suffix tree. The simulation results show that our ACGT-Words tree outperforms the suffix tree and the suffix array in terms of storage and processing time, respectively</abstract><pub>IEEE</pub><doi>10.1109/CIBCB.2007.4221216</doi><tpages>8</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 1424407109
ispartof	2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2007, p.143-150
issn
language	eng
recordid	cdi_ieee_primary_4221216
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Bioinformatics Computational biology Computational intelligence Data structures DNA Genomics Indexing Sequences Tree data structures
title	An ACGT-Words Tree for Efficient Data Access in Genomic Databases
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T17%3A07%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=An%20ACGT-Words%20Tree%20for%20Efficient%20Data%20Access%20in%20Genomic%20Databases&rft.btitle=2007%20IEEE%20Symposium%20on%20Computational%20Intelligence%20and%20Bioinformatics%20and%20Computational%20Biology&rft.au=Ye-In%20Chang&rft.date=2007-04&rft.spage=143&rft.epage=150&rft.pages=143-150&rft.isbn=1424407109&rft.isbn_list=9781424407101&rft_id=info:doi/10.1109/CIBCB.2007.4221216&rft_dat=%3Cieee_6IE%3E4221216%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4221216&rfr_iscdi=true