A semantic-aware data generator for ETL workflows
Summary Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Popul...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2016-03, Vol.28 (4), p.1016-1040 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1040 |
---|---|
container_issue | 4 |
container_start_page | 1016 |
container_title | Concurrency and computation |
container_volume | 28 |
creator | Du, Naiqiao Ye, Xiaojun Wang, Jianmin |
description | Summary
Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd. |
doi_str_mv | 10.1002/cpe.3028 |
format | Article |
fullrecord | <record><control><sourceid>istex_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1002_cpe_3028</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>ark_67375_WNG_CQC4Z3TW_N</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</originalsourceid><addsrcrecordid>eNp1z8FKw0AQgOFFFKxV8BFy9JK6k9kkm2MJtRVKVYgUvCyTza7Epk3ZDcS-vSmVggcPw8zhY-Bn7B74BDiPHvXeTJBH8oKNIMYo5AmKy_MdJdfsxvsvzgE4wojBNPBmS7uu1iH15ExQUUfBp9kZR13rAjvMrFgGfes2tml7f8uuLDXe3P3uMXt_mhX5Ily-zJ_z6TLUiEKGEAshRRanZCHjVJVcaEDU0iSmzISUJSJgBgkQRGUmrU6hEmQFjyyWWOGYPZz-atd674xVe1dvyR0UcHVMVUOqOqYONDzRvm7M4V-n8tfZX1_7znyfPbmNSlJMY7VezVX-losPLNZqhT-HuWIX</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A semantic-aware data generator for ETL workflows</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</creator><creatorcontrib>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</creatorcontrib><description>Summary
Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.3028</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>ETL workflow ; symbolic test ; synthetic data ; workload characterization</subject><ispartof>Concurrency and computation, 2016-03, Vol.28 (4), p.1016-1040</ispartof><rights>Copyright © 2013 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.3028$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.3028$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Du, Naiqiao</creatorcontrib><creatorcontrib>Ye, Xiaojun</creatorcontrib><creatorcontrib>Wang, Jianmin</creatorcontrib><title>A semantic-aware data generator for ETL workflows</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>Summary
Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.</description><subject>ETL workflow</subject><subject>symbolic test</subject><subject>synthetic data</subject><subject>workload characterization</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp1z8FKw0AQgOFFFKxV8BFy9JK6k9kkm2MJtRVKVYgUvCyTza7Epk3ZDcS-vSmVggcPw8zhY-Bn7B74BDiPHvXeTJBH8oKNIMYo5AmKy_MdJdfsxvsvzgE4wojBNPBmS7uu1iH15ExQUUfBp9kZR13rAjvMrFgGfes2tml7f8uuLDXe3P3uMXt_mhX5Ily-zJ_z6TLUiEKGEAshRRanZCHjVJVcaEDU0iSmzISUJSJgBgkQRGUmrU6hEmQFjyyWWOGYPZz-atd674xVe1dvyR0UcHVMVUOqOqYONDzRvm7M4V-n8tfZX1_7znyfPbmNSlJMY7VezVX-losPLNZqhT-HuWIX</recordid><startdate>20160325</startdate><enddate>20160325</enddate><creator>Du, Naiqiao</creator><creator>Ye, Xiaojun</creator><creator>Wang, Jianmin</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20160325</creationdate><title>A semantic-aware data generator for ETL workflows</title><author>Du, Naiqiao ; Ye, Xiaojun ; Wang, Jianmin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3348-154484957af190adb04c133c8e6eb9488b33139161a12b98fc71d4af402f3b3d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>ETL workflow</topic><topic>symbolic test</topic><topic>synthetic data</topic><topic>workload characterization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Naiqiao</creatorcontrib><creatorcontrib>Ye, Xiaojun</creatorcontrib><creatorcontrib>Wang, Jianmin</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Du, Naiqiao</au><au>Ye, Xiaojun</au><au>Wang, Jianmin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A semantic-aware data generator for ETL workflows</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2016-03-25</date><risdate>2016</risdate><volume>28</volume><issue>4</issue><spage>1016</spage><epage>1040</epage><pages>1016-1040</pages><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary
Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cpe.3028</doi><tpages>25</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0626 |
ispartof | Concurrency and computation, 2016-03, Vol.28 (4), p.1016-1040 |
issn | 1532-0626 1532-0634 |
language | eng |
recordid | cdi_crossref_primary_10_1002_cpe_3028 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | ETL workflow symbolic test synthetic data workload characterization |
title | A semantic-aware data generator for ETL workflows |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T18%3A27%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-istex_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20semantic-aware%20data%20generator%20for%20ETL%20workflows&rft.jtitle=Concurrency%20and%20computation&rft.au=Du,%20Naiqiao&rft.date=2016-03-25&rft.volume=28&rft.issue=4&rft.spage=1016&rft.epage=1040&rft.pages=1016-1040&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.3028&rft_dat=%3Cistex_cross%3Eark_67375_WNG_CQC4Z3TW_N%3C/istex_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |