Processing large-scale multi-dimensional data in parallel and distributed environments

Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing 2002-05, Vol.28 (5), p.827-859
Hauptverfasser:	Beynon, Michael, Chang, Chialin, Catalyurek, Umit, Kurc, Tahsin, Sussman, Alan, Andrade, Henrique, Ferreira, Renato, Saltz, Joel
Format:	Artikel
Sprache:	eng
Schlagworte:	Data-intensive applications Distributed computing Multi-dimensional datasets Parallel processing Runtime systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	859
container_issue	5
container_start_page	827
container_title	Parallel computing
container_volume	28
creator	Beynon, Michael Chang, Chialin Catalyurek, Umit Kurc, Tahsin Sussman, Alan Andrade, Henrique Ferreira, Renato Saltz, Joel
description	Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
doi_str_mv	10.1016/S0167-8191(02)00097-2
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_27661611</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819102000972</els_id><sourcerecordid>27661611</sourcerecordid><originalsourceid>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</originalsourceid><addsrcrecordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>27661611</pqid></control><display><type>article</type><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><source>Access via ScienceDirect (Elsevier)</source><creator>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creator><creatorcontrib>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creatorcontrib><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/S0167-8191(02)00097-2</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Data-intensive applications ; Distributed computing ; Multi-dimensional datasets ; Parallel processing ; Runtime systems</subject><ispartof>Parallel computing, 2002-05, Vol.28 (5), p.827-859</ispartof><rights>2002 Elsevier Science B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</citedby><cites>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/S0167-8191(02)00097-2$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><title>Parallel computing</title><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><subject>Data-intensive applications</subject><subject>Distributed computing</subject><subject>Multi-dimensional datasets</subject><subject>Parallel processing</subject><subject>Runtime systems</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><recordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</recordid><startdate>20020501</startdate><enddate>20020501</enddate><creator>Beynon, Michael</creator><creator>Chang, Chialin</creator><creator>Catalyurek, Umit</creator><creator>Kurc, Tahsin</creator><creator>Sussman, Alan</creator><creator>Andrade, Henrique</creator><creator>Ferreira, Renato</creator><creator>Saltz, Joel</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20020501</creationdate><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><author>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Data-intensive applications</topic><topic>Distributed computing</topic><topic>Multi-dimensional datasets</topic><topic>Parallel processing</topic><topic>Runtime systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Beynon, Michael</au><au>Chang, Chialin</au><au>Catalyurek, Umit</au><au>Kurc, Tahsin</au><au>Sussman, Alan</au><au>Andrade, Henrique</au><au>Ferreira, Renato</au><au>Saltz, Joel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Processing large-scale multi-dimensional data in parallel and distributed environments</atitle><jtitle>Parallel computing</jtitle><date>2002-05-01</date><risdate>2002</risdate><volume>28</volume><issue>5</issue><spage>827</spage><epage>859</epage><pages>827-859</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</abstract><pub>Elsevier B.V</pub><doi>10.1016/S0167-8191(02)00097-2</doi><tpages>33</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-8191
ispartof	Parallel computing, 2002-05, Vol.28 (5), p.827-859
issn	0167-8191 1872-7336
language	eng
recordid	cdi_proquest_miscellaneous_27661611
source	Access via ScienceDirect (Elsevier)
subjects	Data-intensive applications Distributed computing Multi-dimensional datasets Parallel processing Runtime systems
title	Processing large-scale multi-dimensional data in parallel and distributed environments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Processing%20large-scale%20multi-dimensional%20data%20in%20parallel%20and%20distributed%20environments&rft.jtitle=Parallel%20computing&rft.au=Beynon,%20Michael&rft.date=2002-05-01&rft.volume=28&rft.issue=5&rft.spage=827&rft.epage=859&rft.pages=827-859&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/S0167-8191(02)00097-2&rft_dat=%3Cproquest_cross%3E27661611%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27661611&rft_id=info:pmid/&rft_els_id=S0167819102000972&rfr_iscdi=true