Processing large-scale multi-dimensional data in parallel and distributed environments
Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability...
Gespeichert in:
Veröffentlicht in: | Parallel computing 2002-05, Vol.28 (5), p.827-859 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 859 |
---|---|
container_issue | 5 |
container_start_page | 827 |
container_title | Parallel computing |
container_volume | 28 |
creator | Beynon, Michael Chang, Chialin Catalyurek, Umit Kurc, Tahsin Sussman, Alan Andrade, Henrique Ferreira, Renato Saltz, Joel |
description | Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available
raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments. |
doi_str_mv | 10.1016/S0167-8191(02)00097-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_27661611</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819102000972</els_id><sourcerecordid>27661611</sourcerecordid><originalsourceid>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</originalsourceid><addsrcrecordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>27661611</pqid></control><display><type>article</type><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><source>Access via ScienceDirect (Elsevier)</source><creator>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creator><creatorcontrib>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creatorcontrib><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available
raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/S0167-8191(02)00097-2</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Data-intensive applications ; Distributed computing ; Multi-dimensional datasets ; Parallel processing ; Runtime systems</subject><ispartof>Parallel computing, 2002-05, Vol.28 (5), p.827-859</ispartof><rights>2002 Elsevier Science B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</citedby><cites>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/S0167-8191(02)00097-2$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><title>Parallel computing</title><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available
raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><subject>Data-intensive applications</subject><subject>Distributed computing</subject><subject>Multi-dimensional datasets</subject><subject>Parallel processing</subject><subject>Runtime systems</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><recordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</recordid><startdate>20020501</startdate><enddate>20020501</enddate><creator>Beynon, Michael</creator><creator>Chang, Chialin</creator><creator>Catalyurek, Umit</creator><creator>Kurc, Tahsin</creator><creator>Sussman, Alan</creator><creator>Andrade, Henrique</creator><creator>Ferreira, Renato</creator><creator>Saltz, Joel</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20020501</creationdate><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><author>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Data-intensive applications</topic><topic>Distributed computing</topic><topic>Multi-dimensional datasets</topic><topic>Parallel processing</topic><topic>Runtime systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Beynon, Michael</au><au>Chang, Chialin</au><au>Catalyurek, Umit</au><au>Kurc, Tahsin</au><au>Sussman, Alan</au><au>Andrade, Henrique</au><au>Ferreira, Renato</au><au>Saltz, Joel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Processing large-scale multi-dimensional data in parallel and distributed environments</atitle><jtitle>Parallel computing</jtitle><date>2002-05-01</date><risdate>2002</risdate><volume>28</volume><issue>5</issue><spage>827</spage><epage>859</epage><pages>827-859</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available
raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</abstract><pub>Elsevier B.V</pub><doi>10.1016/S0167-8191(02)00097-2</doi><tpages>33</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-8191 |
ispartof | Parallel computing, 2002-05, Vol.28 (5), p.827-859 |
issn | 0167-8191 1872-7336 |
language | eng |
recordid | cdi_proquest_miscellaneous_27661611 |
source | Access via ScienceDirect (Elsevier) |
subjects | Data-intensive applications Distributed computing Multi-dimensional datasets Parallel processing Runtime systems |
title | Processing large-scale multi-dimensional data in parallel and distributed environments |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Processing%20large-scale%20multi-dimensional%20data%20in%20parallel%20and%20distributed%20environments&rft.jtitle=Parallel%20computing&rft.au=Beynon,%20Michael&rft.date=2002-05-01&rft.volume=28&rft.issue=5&rft.spage=827&rft.epage=859&rft.pages=827-859&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/S0167-8191(02)00097-2&rft_dat=%3Cproquest_cross%3E27661611%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27661611&rft_id=info:pmid/&rft_els_id=S0167819102000972&rfr_iscdi=true |