Processing large-scale multi-dimensional data in parallel and distributed environments

Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing 2002-05, Vol.28 (5), p.827-859
Hauptverfasser: Beynon, Michael, Chang, Chialin, Catalyurek, Umit, Kurc, Tahsin, Sussman, Alan, Andrade, Henrique, Ferreira, Renato, Saltz, Joel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 859
container_issue 5
container_start_page 827
container_title Parallel computing
container_volume 28
creator Beynon, Michael
Chang, Chialin
Catalyurek, Umit
Kurc, Tahsin
Sussman, Alan
Andrade, Henrique
Ferreira, Renato
Saltz, Joel
description Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
doi_str_mv 10.1016/S0167-8191(02)00097-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_27661611</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819102000972</els_id><sourcerecordid>27661611</sourcerecordid><originalsourceid>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</originalsourceid><addsrcrecordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>27661611</pqid></control><display><type>article</type><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><source>Access via ScienceDirect (Elsevier)</source><creator>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creator><creatorcontrib>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</creatorcontrib><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/S0167-8191(02)00097-2</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Data-intensive applications ; Distributed computing ; Multi-dimensional datasets ; Parallel processing ; Runtime systems</subject><ispartof>Parallel computing, 2002-05, Vol.28 (5), p.827-859</ispartof><rights>2002 Elsevier Science B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</citedby><cites>FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/S0167-8191(02)00097-2$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><title>Parallel computing</title><description>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</description><subject>Data-intensive applications</subject><subject>Distributed computing</subject><subject>Multi-dimensional datasets</subject><subject>Parallel processing</subject><subject>Runtime systems</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><recordid>eNqFkMtKxDAUhoMoOI4-gpCV6CKaS5u2KxHxBgMKXrYhTU6HSJqOSTrg29uZEbduzuHA9_9wPoROGb1klMmr12lUpGYNO6f8glLaVITvoRmrK04qIeQ-mv0hh-gopc8JkkVNZ-jjJQ4GUnJhib2OSyDJaA-4H312xLoeQnJD0B5bnTV2Aa901N6DxzpYbF3K0bVjBoshrF0cwpTI6RgddNonOPndc_R-f_d2-0gWzw9PtzcLYngpMoGW8U6Wsiyaquo6sFRQoKCFlG3R0KbUDd_cwIypGbdUM160wBtesNqITszR2a53FYevEVJWvUsGvNcBhjEpXknJJGMTWO5AE4eUInRqFV2v47diVG0sqq1FtVGkKFdbi4pPuetdDqYv1g6iSsZBMGBdBJOVHdw_DT-z2XrO</recordid><startdate>20020501</startdate><enddate>20020501</enddate><creator>Beynon, Michael</creator><creator>Chang, Chialin</creator><creator>Catalyurek, Umit</creator><creator>Kurc, Tahsin</creator><creator>Sussman, Alan</creator><creator>Andrade, Henrique</creator><creator>Ferreira, Renato</creator><creator>Saltz, Joel</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20020501</creationdate><title>Processing large-scale multi-dimensional data in parallel and distributed environments</title><author>Beynon, Michael ; Chang, Chialin ; Catalyurek, Umit ; Kurc, Tahsin ; Sussman, Alan ; Andrade, Henrique ; Ferreira, Renato ; Saltz, Joel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c253t-eb12f65654977ffed030e0ea366b49095a92e0eae1cc812d0a124be292418c3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Data-intensive applications</topic><topic>Distributed computing</topic><topic>Multi-dimensional datasets</topic><topic>Parallel processing</topic><topic>Runtime systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Beynon, Michael</creatorcontrib><creatorcontrib>Chang, Chialin</creatorcontrib><creatorcontrib>Catalyurek, Umit</creatorcontrib><creatorcontrib>Kurc, Tahsin</creatorcontrib><creatorcontrib>Sussman, Alan</creatorcontrib><creatorcontrib>Andrade, Henrique</creatorcontrib><creatorcontrib>Ferreira, Renato</creatorcontrib><creatorcontrib>Saltz, Joel</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Beynon, Michael</au><au>Chang, Chialin</au><au>Catalyurek, Umit</au><au>Kurc, Tahsin</au><au>Sussman, Alan</au><au>Andrade, Henrique</au><au>Ferreira, Renato</au><au>Saltz, Joel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Processing large-scale multi-dimensional data in parallel and distributed environments</atitle><jtitle>Parallel computing</jtitle><date>2002-05-01</date><risdate>2002</risdate><volume>28</volume><issue>5</issue><spage>827</spage><epage>859</epage><pages>827-859</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.</abstract><pub>Elsevier B.V</pub><doi>10.1016/S0167-8191(02)00097-2</doi><tpages>33</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-8191
ispartof Parallel computing, 2002-05, Vol.28 (5), p.827-859
issn 0167-8191
1872-7336
language eng
recordid cdi_proquest_miscellaneous_27661611
source Access via ScienceDirect (Elsevier)
subjects Data-intensive applications
Distributed computing
Multi-dimensional datasets
Parallel processing
Runtime systems
title Processing large-scale multi-dimensional data in parallel and distributed environments
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A56%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Processing%20large-scale%20multi-dimensional%20data%20in%20parallel%20and%20distributed%20environments&rft.jtitle=Parallel%20computing&rft.au=Beynon,%20Michael&rft.date=2002-05-01&rft.volume=28&rft.issue=5&rft.spage=827&rft.epage=859&rft.pages=827-859&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/S0167-8191(02)00097-2&rft_dat=%3Cproquest_cross%3E27661611%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=27661611&rft_id=info:pmid/&rft_els_id=S0167819102000972&rfr_iscdi=true