A semantic-aware data generator for ETL workflows

Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Popul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2016-03, Vol.28 (4), p.1016-1040
Hauptverfasser: Du, Naiqiao, Ye, Xiaojun, Wang, Jianmin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1040
container_issue 4
container_start_page 1016
container_title Concurrency and computation
container_volume 28
creator Du, Naiqiao
Ye, Xiaojun
Wang, Jianmin
description Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/cpe.3028
format Article
fullrecord <record><control><sourceid>istex_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1002_cpe_3028</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>ark_67375_WNG_CQC4Z3TW_N</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</originalsourceid><addsrcrecordid>eNp1z8FKw0AQgOFFFKxV8BFy9JK6k9kkm2MJtRVKVYgUvCyTza7Epk3ZDcS-vSmVggcPw8zhY-Bn7B74BDiPHvXeTJBH8oKNIMYo5AmKy_MdJdfsxvsvzgE4wojBNPBmS7uu1iH15ExQUUfBp9kZR13rAjvMrFgGfes2tml7f8uuLDXe3P3uMXt_mhX5Ily-zJ_z6TLUiEKGEAshRRanZCHjVJVcaEDU0iSmzISUJSJgBgkQRGUmrU6hEmQFjyyWWOGYPZz-atd674xVe1dvyR0UcHVMVUOqOqYONDzRvm7M4V-n8tfZX1_7znyfPbmNSlJMY7VezVX-losPLNZqhT-HuWIX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A semantic-aware data generator for ETL workflows</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</creator><creatorcontrib>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</creatorcontrib><description>Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.3028</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>ETL workflow ; symbolic test ; synthetic data ; workload characterization</subject><ispartof>Concurrency and computation, 2016-03, Vol.28 (4), p.1016-1040</ispartof><rights>Copyright © 2013 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.3028$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.3028$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Du, Naiqiao</creatorcontrib><creatorcontrib>Ye, Xiaojun</creatorcontrib><creatorcontrib>Wang, Jianmin</creatorcontrib><title>A semantic-aware data generator for ETL workflows</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley &amp; Sons, Ltd.</description><subject>ETL workflow</subject><subject>symbolic test</subject><subject>synthetic data</subject><subject>workload characterization</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp1z8FKw0AQgOFFFKxV8BFy9JK6k9kkm2MJtRVKVYgUvCyTza7Epk3ZDcS-vSmVggcPw8zhY-Bn7B74BDiPHvXeTJBH8oKNIMYo5AmKy_MdJdfsxvsvzgE4wojBNPBmS7uu1iH15ExQUUfBp9kZR13rAjvMrFgGfes2tml7f8uuLDXe3P3uMXt_mhX5Ily-zJ_z6TLUiEKGEAshRRanZCHjVJVcaEDU0iSmzISUJSJgBgkQRGUmrU6hEmQFjyyWWOGYPZz-atd674xVe1dvyR0UcHVMVUOqOqYONDzRvm7M4V-n8tfZX1_7znyfPbmNSlJMY7VezVX-losPLNZqhT-HuWIX</recordid><startdate>20160325</startdate><enddate>20160325</enddate><creator>Du, Naiqiao</creator><creator>Ye, Xiaojun</creator><creator>Wang, Jianmin</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20160325</creationdate><title>A semantic-aware data generator for ETL workflows</title><author>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>ETL workflow</topic><topic>symbolic test</topic><topic>synthetic data</topic><topic>workload characterization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Naiqiao</creatorcontrib><creatorcontrib>Ye, Xiaojun</creatorcontrib><creatorcontrib>Wang, Jianmin</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Du, Naiqiao</au><au>Ye, Xiaojun</au><au>Wang, Jianmin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A semantic-aware data generator for ETL workflows</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2016-03-25</date><risdate>2016</risdate><volume>28</volume><issue>4</issue><spage>1016</spage><epage>1040</epage><pages>1016-1040</pages><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley &amp; Sons, Ltd.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cpe.3028</doi><tpages>25</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1532-0626
ispartof Concurrency and computation, 2016-03, Vol.28 (4), p.1016-1040
issn 1532-0626
1532-0634
language eng
recordid cdi_crossref_primary_10_1002_cpe_3028
source Wiley Online Library Journals Frontfile Complete
subjects ETL workflow
symbolic test
synthetic data
workload characterization
title A semantic-aware data generator for ETL workflows
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T18%3A27%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-istex_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20semantic-aware%20data%20generator%20for%20ETL%20workflows&rft.jtitle=Concurrency%20and%20computation&rft.au=Du,%20Naiqiao&rft.date=2016-03-25&rft.volume=28&rft.issue=4&rft.spage=1016&rft.epage=1040&rft.pages=1016-1040&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.3028&rft_dat=%3Cistex_cross%3Eark_67375_WNG_CQC4Z3TW_N%3C/istex_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true