Informal Data Transformation Considered Harmful

In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and val...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Daimler, Eric, Wisnesky, Ryan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Daimler, Eric Wisnesky, Ryan
description	In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.
doi_str_mv	10.48550/arxiv.2001.00338
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2001_00338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2001_00338</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-d92ef5b5ef0833346b23078870382638613dd4b7315cafd156b5e2582fd8e8be3</originalsourceid><addsrcrecordid>eNotzs1qwzAQBGBdcghJHiCn-gXsrLSWvD0W589g6MV3s64kMPinyElI3j6J29PAMAyfEFsJSUpaw47Dvb0lCkAmAIi0FLti8GPouYv2fOGoCjxMc3FpxyHKx2FqrQvORmcOvb92a7Hw3E1u858rUR0PVX6Oy-9TkX-VMZuMYvupnNeNdh4IEVPTKISMKAMkZZCMRGvTJkOpf9hbqc1rqzQpb8lR43AlPv5uZ3H9G9qew6N-y-tZjk-4Rjz2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Informal Data Transformation Considered Harmful</title><source>arXiv.org</source><creator>Daimler, Eric ; Wisnesky, Ryan</creator><creatorcontrib>Daimler, Eric ; Wisnesky, Ryan</creatorcontrib><description>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</description><identifier>DOI: 10.48550/arxiv.2001.00338</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Databases</subject><creationdate>2020-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2001.00338$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2001.00338$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Daimler, Eric</creatorcontrib><creatorcontrib>Wisnesky, Ryan</creatorcontrib><title>Informal Data Transformation Considered Harmful</title><description>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qwzAQBGBdcghJHiCn-gXsrLSWvD0W589g6MV3s64kMPinyElI3j6J29PAMAyfEFsJSUpaw47Dvb0lCkAmAIi0FLti8GPouYv2fOGoCjxMc3FpxyHKx2FqrQvORmcOvb92a7Hw3E1u858rUR0PVX6Oy-9TkX-VMZuMYvupnNeNdh4IEVPTKISMKAMkZZCMRGvTJkOpf9hbqc1rqzQpb8lR43AlPv5uZ3H9G9qew6N-y-tZjk-4Rjz2</recordid><startdate>20200102</startdate><enddate>20200102</enddate><creator>Daimler, Eric</creator><creator>Wisnesky, Ryan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200102</creationdate><title>Informal Data Transformation Considered Harmful</title><author>Daimler, Eric ; Wisnesky, Ryan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-d92ef5b5ef0833346b23078870382638613dd4b7315cafd156b5e2582fd8e8be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Daimler, Eric</creatorcontrib><creatorcontrib>Wisnesky, Ryan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Daimler, Eric</au><au>Wisnesky, Ryan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Informal Data Transformation Considered Harmful</atitle><date>2020-01-02</date><risdate>2020</risdate><abstract>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</abstract><doi>10.48550/arxiv.2001.00338</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2001.00338
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2001_00338
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Databases
title	Informal Data Transformation Considered Harmful
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T13%3A51%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Informal%20Data%20Transformation%20Considered%20Harmful&rft.au=Daimler,%20Eric&rft.date=2020-01-02&rft_id=info:doi/10.48550/arxiv.2001.00338&rft_dat=%3Carxiv_GOX%3E2001_00338%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true