VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses
In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper,...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2020-03, Vol.32 (3), p.602-616 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 616 |
---|---|
container_issue | 3 |
container_start_page | 602 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 32 |
creator | Liu, Xianying Zhu, Qiang Pramanik, Sakti Brown, C. Titus Qian, Gang |
description | In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper, we present a novel method, called the VA-Store, to reduce the large space requirement for repetitive data in prevailing genome sequence analysis tasks using k-mers (i.e., subsequences of length k) with multiple k values. The VA-Store maintains a physical store for one portion of the input dataset (i.e., k 0 -mers) and supports multiple virtual stores for other portions of the dataset (i.e., k-mers with k ≠ k 0 ). Utilizing important relationships among repetitive data, the VA-Store transforms a given query on a virtual store into one or more queries on the physical store for execution. Both precise and approximate transformations are considered. Accuracy estimation models for approximate solutions are derived. Query optimization strategies are suggested to improve query performance. Our experiments using real and synthetic datasets demonstrate that the VA-Store is quite promising in providing effective storage and efficient query processing for solving a kernel database problem on repetitive big data for genome sequence analysis applications. |
doi_str_mv | 10.1109/TKDE.2018.2885952 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_webofscience_primary_000526526700014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8573155</ieee_id><sourcerecordid>2352189460</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-fcfaf2bb01deeebce3970fd4c4fdd675f84b1abc42db1adbdaad88e699993eda3</originalsourceid><addsrcrecordid>eNqNkVtP3DAQhSNEJS7lB6C-WOpjla1vSRzetgulqCtVYoHXyLHHYLTEqe2U8u872yCesSzNyD6fdea4KE4ZXTBG2683P88vFpwyteBKVW3F94pDVlWq5Kxl-9hTyUopZHNQHKX0SClVjWKHxfZuWW5yiHBGluTOxzzpLVmOYwx__ZPOQP5fzifaPJAcyGYaxxCzH-7JNYyQffZ_gHzz9-RcZ038QC5hCE-Iwu8JBoP0oLcvCdLH4oPT2wQnr_W4uP1-cbP6Ua5_XV6tluvSCFHn0hmnHe97yiwA9AZE21BnpZHO2rqpnJI9072R3GK1vdXaKgV1i0uA1eK4-Dy_i6bRQsrdY5gimkgdFxVnqpU1RRWbVSaGlCK4bow4c3zpGO12oXa7ULtdqN1rqMh8mZln6INLxu_me-Mw1YrXuBvsmES1er965bPOPgyrMA0Z0U8z6jGCN0RVjcBfFf8A_UOWGQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352189460</pqid></control><display><type>article</type><title>VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Xianying ; Zhu, Qiang ; Pramanik, Sakti ; Brown, C. Titus ; Qian, Gang</creator><creatorcontrib>Liu, Xianying ; Zhu, Qiang ; Pramanik, Sakti ; Brown, C. Titus ; Qian, Gang</creatorcontrib><description>In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper, we present a novel method, called the VA-Store, to reduce the large space requirement for repetitive data in prevailing genome sequence analysis tasks using k-mers (i.e., subsequences of length k) with multiple k values. The VA-Store maintains a physical store for one portion of the input dataset (i.e., k 0 -mers) and supports multiple virtual stores for other portions of the dataset (i.e., k-mers with k ≠ k 0 ). Utilizing important relationships among repetitive data, the VA-Store transforms a given query on a virtual store into one or more queries on the physical store for execution. Both precise and approximate transformations are considered. Accuracy estimation models for approximate solutions are derived. Query optimization strategies are suggested to improve query performance. Our experiments using real and synthetic datasets demonstrate that the VA-Store is quite promising in providing effective storage and efficient query processing for solving a kernel database problem on repetitive big data for genome sequence analysis applications.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2018.2885952</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>LOS ALAMITOS: IEEE</publisher><subject>algorithms for data and knowledge management ; Big Data ; Bioinformatics ; Bioinformatics (genome or protein) databases ; Computer Science ; Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Data analysis ; data storage representations ; Datasets ; Engineering ; Engineering, Electrical & Electronic ; Genomes ; Genomics ; Model accuracy ; Optimization ; Queries ; Query processing ; Science & Technology ; Search problems ; Sequences ; Sequential analysis ; Technology</subject><ispartof>IEEE transactions on knowledge and data engineering, 2020-03, Vol.32 (3), p.602-616</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>3</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000526526700014</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c336t-fcfaf2bb01deeebce3970fd4c4fdd675f84b1abc42db1adbdaad88e699993eda3</citedby><cites>FETCH-LOGICAL-c336t-fcfaf2bb01deeebce3970fd4c4fdd675f84b1abc42db1adbdaad88e699993eda3</cites><orcidid>0000-0002-5658-5875 ; 0000-0001-7094-9236</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8573155$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,28253,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8573155$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Xianying</creatorcontrib><creatorcontrib>Zhu, Qiang</creatorcontrib><creatorcontrib>Pramanik, Sakti</creatorcontrib><creatorcontrib>Brown, C. Titus</creatorcontrib><creatorcontrib>Qian, Gang</creatorcontrib><title>VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><addtitle>IEEE T KNOWL DATA EN</addtitle><description>In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper, we present a novel method, called the VA-Store, to reduce the large space requirement for repetitive data in prevailing genome sequence analysis tasks using k-mers (i.e., subsequences of length k) with multiple k values. The VA-Store maintains a physical store for one portion of the input dataset (i.e., k 0 -mers) and supports multiple virtual stores for other portions of the dataset (i.e., k-mers with k ≠ k 0 ). Utilizing important relationships among repetitive data, the VA-Store transforms a given query on a virtual store into one or more queries on the physical store for execution. Both precise and approximate transformations are considered. Accuracy estimation models for approximate solutions are derived. Query optimization strategies are suggested to improve query performance. Our experiments using real and synthetic datasets demonstrate that the VA-Store is quite promising in providing effective storage and efficient query processing for solving a kernel database problem on repetitive big data for genome sequence analysis applications.</description><subject>algorithms for data and knowledge management</subject><subject>Big Data</subject><subject>Bioinformatics</subject><subject>Bioinformatics (genome or protein) databases</subject><subject>Computer Science</subject><subject>Computer Science, Artificial Intelligence</subject><subject>Computer Science, Information Systems</subject><subject>Data analysis</subject><subject>data storage representations</subject><subject>Datasets</subject><subject>Engineering</subject><subject>Engineering, Electrical & Electronic</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Model accuracy</subject><subject>Optimization</subject><subject>Queries</subject><subject>Query processing</subject><subject>Science & Technology</subject><subject>Search problems</subject><subject>Sequences</subject><subject>Sequential analysis</subject><subject>Technology</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>AOWDO</sourceid><recordid>eNqNkVtP3DAQhSNEJS7lB6C-WOpjla1vSRzetgulqCtVYoHXyLHHYLTEqe2U8u872yCesSzNyD6fdea4KE4ZXTBG2683P88vFpwyteBKVW3F94pDVlWq5Kxl-9hTyUopZHNQHKX0SClVjWKHxfZuWW5yiHBGluTOxzzpLVmOYwx__ZPOQP5fzifaPJAcyGYaxxCzH-7JNYyQffZ_gHzz9-RcZ038QC5hCE-Iwu8JBoP0oLcvCdLH4oPT2wQnr_W4uP1-cbP6Ua5_XV6tluvSCFHn0hmnHe97yiwA9AZE21BnpZHO2rqpnJI9072R3GK1vdXaKgV1i0uA1eK4-Dy_i6bRQsrdY5gimkgdFxVnqpU1RRWbVSaGlCK4bow4c3zpGO12oXa7ULtdqN1rqMh8mZln6INLxu_me-Mw1YrXuBvsmES1er965bPOPgyrMA0Z0U8z6jGCN0RVjcBfFf8A_UOWGQ</recordid><startdate>20200301</startdate><enddate>20200301</enddate><creator>Liu, Xianying</creator><creator>Zhu, Qiang</creator><creator>Pramanik, Sakti</creator><creator>Brown, C. Titus</creator><creator>Qian, Gang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5658-5875</orcidid><orcidid>https://orcid.org/0000-0001-7094-9236</orcidid></search><sort><creationdate>20200301</creationdate><title>VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses</title><author>Liu, Xianying ; Zhu, Qiang ; Pramanik, Sakti ; Brown, C. Titus ; Qian, Gang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-fcfaf2bb01deeebce3970fd4c4fdd675f84b1abc42db1adbdaad88e699993eda3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>algorithms for data and knowledge management</topic><topic>Big Data</topic><topic>Bioinformatics</topic><topic>Bioinformatics (genome or protein) databases</topic><topic>Computer Science</topic><topic>Computer Science, Artificial Intelligence</topic><topic>Computer Science, Information Systems</topic><topic>Data analysis</topic><topic>data storage representations</topic><topic>Datasets</topic><topic>Engineering</topic><topic>Engineering, Electrical & Electronic</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Model accuracy</topic><topic>Optimization</topic><topic>Queries</topic><topic>Query processing</topic><topic>Science & Technology</topic><topic>Search problems</topic><topic>Sequences</topic><topic>Sequential analysis</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xianying</creatorcontrib><creatorcontrib>Zhu, Qiang</creatorcontrib><creatorcontrib>Pramanik, Sakti</creatorcontrib><creatorcontrib>Brown, C. Titus</creatorcontrib><creatorcontrib>Qian, Gang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Xianying</au><au>Zhu, Qiang</au><au>Pramanik, Sakti</au><au>Brown, C. Titus</au><au>Qian, Gang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><stitle>IEEE T KNOWL DATA EN</stitle><date>2020-03-01</date><risdate>2020</risdate><volume>32</volume><issue>3</issue><spage>602</spage><epage>616</epage><pages>602-616</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper, we present a novel method, called the VA-Store, to reduce the large space requirement for repetitive data in prevailing genome sequence analysis tasks using k-mers (i.e., subsequences of length k) with multiple k values. The VA-Store maintains a physical store for one portion of the input dataset (i.e., k 0 -mers) and supports multiple virtual stores for other portions of the dataset (i.e., k-mers with k ≠ k 0 ). Utilizing important relationships among repetitive data, the VA-Store transforms a given query on a virtual store into one or more queries on the physical store for execution. Both precise and approximate transformations are considered. Accuracy estimation models for approximate solutions are derived. Query optimization strategies are suggested to improve query performance. Our experiments using real and synthetic datasets demonstrate that the VA-Store is quite promising in providing effective storage and efficient query processing for solving a kernel database problem on repetitive big data for genome sequence analysis applications.</abstract><cop>LOS ALAMITOS</cop><pub>IEEE</pub><doi>10.1109/TKDE.2018.2885952</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5658-5875</orcidid><orcidid>https://orcid.org/0000-0001-7094-9236</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2020-03, Vol.32 (3), p.602-616 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_webofscience_primary_000526526700014 |
source | IEEE Electronic Library (IEL) |
subjects | algorithms for data and knowledge management Big Data Bioinformatics Bioinformatics (genome or protein) databases Computer Science Computer Science, Artificial Intelligence Computer Science, Information Systems Data analysis data storage representations Datasets Engineering Engineering, Electrical & Electronic Genomes Genomics Model accuracy Optimization Queries Query processing Science & Technology Search problems Sequences Sequential analysis Technology |
title | VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big Data in Genome Sequence Analyses |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T00%3A40%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VA-Store:%20A%20Virtual%20Approximate%20Store%20Approach%20to%20Supporting%20Repetitive%20Big%20Data%20in%20Genome%20Sequence%20Analyses&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Liu,%20Xianying&rft.date=2020-03-01&rft.volume=32&rft.issue=3&rft.spage=602&rft.epage=616&rft.pages=602-616&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2018.2885952&rft_dat=%3Cproquest_RIE%3E2352189460%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2352189460&rft_id=info:pmid/&rft_ieee_id=8573155&rfr_iscdi=true |