RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets ar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics 2020-01, Vol.26 (1), p.1161-1171
Hauptverfasser: Mei, Honghui, Chen, Wei, Wei, Yating, Hu, Yuanzhe, Zhou, Shuyue, Lin, Bingru, Zhao, Ying, Xia, Jiazhi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1171
container_issue 1
container_start_page 1161
container_title IEEE transactions on visualization and computer graphics
container_volume 26
creator Mei, Honghui
Chen, Wei
Wei, Yating
Hu, Yuanzhe
Zhou, Shuyue
Lin, Bingru
Zhao, Ying
Xia, Jiazhi
description Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.
doi_str_mv 10.1109/TVCG.2019.2934800
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8807303</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8807303</ieee_id><sourcerecordid>2280545711</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-7ec4daa8ada4cb6a8c8e388bd2f06169672d736886983a5049da608d2862ad843</originalsourceid><addsrcrecordid>eNpdkUtrG0EQhIeQEDt2foAxmIFcclm556F55CbkRwICY1vxdejd7TVjVlp5ZpfY_z6rSPYhpy6or5qmi7ETARMhwJ8vH-bXEwnCT6RX2gF8YIfCa1HAFMzHUYO1hTTSHLAvOT8BCK2d_8wO1CgUSHnI4t39bJmIfvCLmPsUy6GP3bqY_cFE_AJ75He0SZRp3ePW4V3DF5geqbivsCW-xHJoMf1DM_WZN13iVy29xHJ0H2IesOW3A6XXY_apwTbT1_08Yr-vLpfzn8Xi5vrXfLYoKqV9X1iqdI3osEZdlQZd5Ug5V9ayASOMN1bWVhnnjHcKp6B9jQZcLZ2RWDutjtj33d5N6p4Hyn1YxVxR2-KauiEHKR1M9dQKMaLf_kOfuiGtx-uCVMJaaaU0IyV2VJW6nBM1YZPiCtNrEBC2PYRtD2HbQ9j3MGbO9puHckX1e-Lt8SNwugMiEb3bzoFVoNRfIDeLAg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2317727226</pqid></control><display><type>article</type><title>RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query</title><source>IEEE Electronic Library (IEL)</source><creator>Mei, Honghui ; Chen, Wei ; Wei, Yating ; Hu, Yuanzhe ; Zhou, Shuyue ; Lin, Bingru ; Zhao, Ying ; Xia, Jiazhi</creator><creatorcontrib>Mei, Honghui ; Chen, Wei ; Wei, Yating ; Hu, Yuanzhe ; Zhou, Shuyue ; Lin, Bingru ; Zhao, Ying ; Xia, Jiazhi</creatorcontrib><description>Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.</description><identifier>ISSN: 1077-2626</identifier><identifier>EISSN: 1941-0506</identifier><identifier>DOI: 10.1109/TVCG.2019.2934800</identifier><identifier>PMID: 31443022</identifier><identifier>CODEN: ITVGEA</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Aggregate query ; Aggregates ; Complexity ; Cubes ; Data visualization ; Datasets ; hashing ; Histograms ; large-scale data visualization ; Queries ; R-tree ; Random access ; Response time (computers) ; Social networking (online) ; Specifications ; summed area table ; Tables (data) ; Time factors ; Visual databases ; visual query ; Visualization</subject><ispartof>IEEE transactions on visualization and computer graphics, 2020-01, Vol.26 (1), p.1161-1171</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-7ec4daa8ada4cb6a8c8e388bd2f06169672d736886983a5049da608d2862ad843</citedby><cites>FETCH-LOGICAL-c349t-7ec4daa8ada4cb6a8c8e388bd2f06169672d736886983a5049da608d2862ad843</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8807303$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8807303$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31443022$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mei, Honghui</creatorcontrib><creatorcontrib>Chen, Wei</creatorcontrib><creatorcontrib>Wei, Yating</creatorcontrib><creatorcontrib>Hu, Yuanzhe</creatorcontrib><creatorcontrib>Zhou, Shuyue</creatorcontrib><creatorcontrib>Lin, Bingru</creatorcontrib><creatorcontrib>Zhao, Ying</creatorcontrib><creatorcontrib>Xia, Jiazhi</creatorcontrib><title>RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query</title><title>IEEE transactions on visualization and computer graphics</title><addtitle>TVCG</addtitle><addtitle>IEEE Trans Vis Comput Graph</addtitle><description>Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.</description><subject>Aggregate query</subject><subject>Aggregates</subject><subject>Complexity</subject><subject>Cubes</subject><subject>Data visualization</subject><subject>Datasets</subject><subject>hashing</subject><subject>Histograms</subject><subject>large-scale data visualization</subject><subject>Queries</subject><subject>R-tree</subject><subject>Random access</subject><subject>Response time (computers)</subject><subject>Social networking (online)</subject><subject>Specifications</subject><subject>summed area table</subject><subject>Tables (data)</subject><subject>Time factors</subject><subject>Visual databases</subject><subject>visual query</subject><subject>Visualization</subject><issn>1077-2626</issn><issn>1941-0506</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUtrG0EQhIeQEDt2foAxmIFcclm556F55CbkRwICY1vxdejd7TVjVlp5ZpfY_z6rSPYhpy6or5qmi7ETARMhwJ8vH-bXEwnCT6RX2gF8YIfCa1HAFMzHUYO1hTTSHLAvOT8BCK2d_8wO1CgUSHnI4t39bJmIfvCLmPsUy6GP3bqY_cFE_AJ75He0SZRp3ePW4V3DF5geqbivsCW-xHJoMf1DM_WZN13iVy29xHJ0H2IesOW3A6XXY_apwTbT1_08Yr-vLpfzn8Xi5vrXfLYoKqV9X1iqdI3osEZdlQZd5Ug5V9ayASOMN1bWVhnnjHcKp6B9jQZcLZ2RWDutjtj33d5N6p4Hyn1YxVxR2-KauiEHKR1M9dQKMaLf_kOfuiGtx-uCVMJaaaU0IyV2VJW6nBM1YZPiCtNrEBC2PYRtD2HbQ9j3MGbO9puHckX1e-Lt8SNwugMiEb3bzoFVoNRfIDeLAg</recordid><startdate>202001</startdate><enddate>202001</enddate><creator>Mei, Honghui</creator><creator>Chen, Wei</creator><creator>Wei, Yating</creator><creator>Hu, Yuanzhe</creator><creator>Zhou, Shuyue</creator><creator>Lin, Bingru</creator><creator>Zhao, Ying</creator><creator>Xia, Jiazhi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>202001</creationdate><title>RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query</title><author>Mei, Honghui ; Chen, Wei ; Wei, Yating ; Hu, Yuanzhe ; Zhou, Shuyue ; Lin, Bingru ; Zhao, Ying ; Xia, Jiazhi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-7ec4daa8ada4cb6a8c8e388bd2f06169672d736886983a5049da608d2862ad843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Aggregate query</topic><topic>Aggregates</topic><topic>Complexity</topic><topic>Cubes</topic><topic>Data visualization</topic><topic>Datasets</topic><topic>hashing</topic><topic>Histograms</topic><topic>large-scale data visualization</topic><topic>Queries</topic><topic>R-tree</topic><topic>Random access</topic><topic>Response time (computers)</topic><topic>Social networking (online)</topic><topic>Specifications</topic><topic>summed area table</topic><topic>Tables (data)</topic><topic>Time factors</topic><topic>Visual databases</topic><topic>visual query</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mei, Honghui</creatorcontrib><creatorcontrib>Chen, Wei</creatorcontrib><creatorcontrib>Wei, Yating</creatorcontrib><creatorcontrib>Hu, Yuanzhe</creatorcontrib><creatorcontrib>Zhou, Shuyue</creatorcontrib><creatorcontrib>Lin, Bingru</creatorcontrib><creatorcontrib>Zhao, Ying</creatorcontrib><creatorcontrib>Xia, Jiazhi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on visualization and computer graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mei, Honghui</au><au>Chen, Wei</au><au>Wei, Yating</au><au>Hu, Yuanzhe</au><au>Zhou, Shuyue</au><au>Lin, Bingru</au><au>Zhao, Ying</au><au>Xia, Jiazhi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query</atitle><jtitle>IEEE transactions on visualization and computer graphics</jtitle><stitle>TVCG</stitle><addtitle>IEEE Trans Vis Comput Graph</addtitle><date>2020-01</date><risdate>2020</risdate><volume>26</volume><issue>1</issue><spage>1161</spage><epage>1171</epage><pages>1161-1171</pages><issn>1077-2626</issn><eissn>1941-0506</eissn><coden>ITVGEA</coden><abstract>Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>31443022</pmid><doi>10.1109/TVCG.2019.2934800</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1077-2626
ispartof IEEE transactions on visualization and computer graphics, 2020-01, Vol.26 (1), p.1161-1171
issn 1077-2626
1941-0506
language eng
recordid cdi_ieee_primary_8807303
source IEEE Electronic Library (IEL)
subjects Aggregate query
Aggregates
Complexity
Cubes
Data visualization
Datasets
hashing
Histograms
large-scale data visualization
Queries
R-tree
Random access
Response time (computers)
Social networking (online)
Specifications
summed area table
Tables (data)
Time factors
Visual databases
visual query
Visualization
title RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T08%3A09%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RSATree:%20Distribution-Aware%20Data%20Representation%20of%20Large-Scale%20Tabular%20Datasets%20for%20Flexible%20Visual%20Query&rft.jtitle=IEEE%20transactions%20on%20visualization%20and%20computer%20graphics&rft.au=Mei,%20Honghui&rft.date=2020-01&rft.volume=26&rft.issue=1&rft.spage=1161&rft.epage=1171&rft.pages=1161-1171&rft.issn=1077-2626&rft.eissn=1941-0506&rft.coden=ITVGEA&rft_id=info:doi/10.1109/TVCG.2019.2934800&rft_dat=%3Cproquest_RIE%3E2280545711%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2317727226&rft_id=info:pmid/31443022&rft_ieee_id=8807303&rfr_iscdi=true