Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version

Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-07
Hauptverfasser:	Yu, Geoffrey X, Wu, Ziniu, Kossmann, Ferdi, Li, Tianyu, Markakis, Markos, Ngom, Amadou, Madden, Samuel, Kraska, Tim
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Databases Cost control Data base management systems Data management Design Design optimization Infrastructure Performance evaluation Workload Workloads
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yu, Geoffrey X Wu, Ziniu Kossmann, Ferdi Li, Tianyu Markakis, Markos Ngom, Amadou Madden, Samuel Kraska, Tim
description	Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6-13x compared to serverless auto-scaling or HTAP systems.
doi_str_mv	10.48550/arxiv.2407.15363
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2407_15363</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3083766772</sourcerecordid><originalsourceid>FETCH-LOGICAL-a522-e8fe59ce32feeb0d1227680940ed2353dc7d597c2ad85506575acc9b8e0b99783</originalsourceid><addsrcrecordid>eNotkF1rwjAUhsNgMHH-gF0tsOu6NGmadHd-7EMQhOF2W2JyOiM1dUnqdL9-Vnd14JyHl_M-CN2lZJhJzsmj8ge7H9KMiGHKWc6uUI8yliYyo_QGDULYEEJoLijnrIf247qFnbcuWveF4xrwpG5a84Q_nK2O3U45g0dtbLYqWq3q-ogXu2i39rc7nmE8VVHhmau8CtG3OrYeAv6xcY3H76MpThL8fIjgDBj8CT7Yxt2i60rVAQb_s4-WL8_LyVsyX7zOJqN5ojilCcgKeKGB0QpgRUxKqcglKTIChjLOjBaGF0JTZbrqORdcaV2sJJBVUQjJ-uj-EnuWUp56bpU_lp2c8iznRDxciJ1vvlsIsdw0rXenn0pGJBN5LgRlf_fwaC4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3083766772</pqid></control><display><type>article</type><title>Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Yu, Geoffrey X ; Wu, Ziniu ; Kossmann, Ferdi ; Li, Tianyu ; Markakis, Markos ; Ngom, Amadou ; Madden, Samuel ; Kraska, Tim</creator><creatorcontrib>Yu, Geoffrey X ; Wu, Ziniu ; Kossmann, Ferdi ; Li, Tianyu ; Markakis, Markos ; Ngom, Amadou ; Madden, Samuel ; Kraska, Tim</creatorcontrib><description>Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6-13x compared to serverless auto-scaling or HTAP systems.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2407.15363</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Databases ; Cost control ; Data base management systems ; Data management ; Design ; Design optimization ; Infrastructure ; Performance evaluation ; Workload ; Workloads</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.15363$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.14778/3681954.3682026$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Geoffrey X</creatorcontrib><creatorcontrib>Wu, Ziniu</creatorcontrib><creatorcontrib>Kossmann, Ferdi</creatorcontrib><creatorcontrib>Li, Tianyu</creatorcontrib><creatorcontrib>Markakis, Markos</creatorcontrib><creatorcontrib>Ngom, Amadou</creatorcontrib><creatorcontrib>Madden, Samuel</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><title>Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version</title><title>arXiv.org</title><description>Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6-13x compared to serverless auto-scaling or HTAP systems.</description><subject>Computer Science - Databases</subject><subject>Cost control</subject><subject>Data base management systems</subject><subject>Data management</subject><subject>Design</subject><subject>Design optimization</subject><subject>Infrastructure</subject><subject>Performance evaluation</subject><subject>Workload</subject><subject>Workloads</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkF1rwjAUhsNgMHH-gF0tsOu6NGmadHd-7EMQhOF2W2JyOiM1dUnqdL9-Vnd14JyHl_M-CN2lZJhJzsmj8ge7H9KMiGHKWc6uUI8yliYyo_QGDULYEEJoLijnrIf247qFnbcuWveF4xrwpG5a84Q_nK2O3U45g0dtbLYqWq3q-ogXu2i39rc7nmE8VVHhmau8CtG3OrYeAv6xcY3H76MpThL8fIjgDBj8CT7Yxt2i60rVAQb_s4-WL8_LyVsyX7zOJqN5ojilCcgKeKGB0QpgRUxKqcglKTIChjLOjBaGF0JTZbrqORdcaV2sJJBVUQjJ-uj-EnuWUp56bpU_lp2c8iznRDxciJ1vvlsIsdw0rXenn0pGJBN5LgRlf_fwaC4</recordid><startdate>20240722</startdate><enddate>20240722</enddate><creator>Yu, Geoffrey X</creator><creator>Wu, Ziniu</creator><creator>Kossmann, Ferdi</creator><creator>Li, Tianyu</creator><creator>Markakis, Markos</creator><creator>Ngom, Amadou</creator><creator>Madden, Samuel</creator><creator>Kraska, Tim</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240722</creationdate><title>Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version</title><author>Yu, Geoffrey X ; Wu, Ziniu ; Kossmann, Ferdi ; Li, Tianyu ; Markakis, Markos ; Ngom, Amadou ; Madden, Samuel ; Kraska, Tim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a522-e8fe59ce32feeb0d1227680940ed2353dc7d597c2ad85506575acc9b8e0b99783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Databases</topic><topic>Cost control</topic><topic>Data base management systems</topic><topic>Data management</topic><topic>Design</topic><topic>Design optimization</topic><topic>Infrastructure</topic><topic>Performance evaluation</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Geoffrey X</creatorcontrib><creatorcontrib>Wu, Ziniu</creatorcontrib><creatorcontrib>Kossmann, Ferdi</creatorcontrib><creatorcontrib>Li, Tianyu</creatorcontrib><creatorcontrib>Markakis, Markos</creatorcontrib><creatorcontrib>Ngom, Amadou</creatorcontrib><creatorcontrib>Madden, Samuel</creatorcontrib><creatorcontrib>Kraska, Tim</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Geoffrey X</au><au>Wu, Ziniu</au><au>Kossmann, Ferdi</au><au>Li, Tianyu</au><au>Markakis, Markos</au><au>Ngom, Amadou</au><au>Madden, Samuel</au><au>Kraska, Tim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version</atitle><jtitle>arXiv.org</jtitle><date>2024-07-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6-13x compared to serverless auto-scaling or HTAP systems.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2407.15363</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-07
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2407_15363
source	Freely Accessible Journals; arXiv.org
subjects	Computer Science - Databases Cost control Data base management systems Data management Design Design optimization Infrastructure Performance evaluation Workload Workloads
title	Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T07%3A05%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Blueprinting%20the%20Cloud:%20Unifying%20and%20Automatically%20Optimizing%20Cloud%20Data%20Infrastructures%20with%20BRAD%20--%20Extended%20Version&rft.jtitle=arXiv.org&rft.au=Yu,%20Geoffrey%20X&rft.date=2024-07-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2407.15363&rft_dat=%3Cproquest_arxiv%3E3083766772%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3083766772&rft_id=info:pmid/&rfr_iscdi=true