Informal Data Transformation Considered Harmful

In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and val...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Daimler, Eric, Wisnesky, Ryan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Daimler, Eric
Wisnesky, Ryan
description In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.
doi_str_mv 10.48550/arxiv.2001.00338
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2001_00338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2001_00338</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-d92ef5b5ef0833346b23078870382638613dd4b7315cafd156b5e2582fd8e8be3</originalsourceid><addsrcrecordid>eNotzs1qwzAQBGBdcghJHiCn-gXsrLSWvD0W589g6MV3s64kMPinyElI3j6J29PAMAyfEFsJSUpaw47Dvb0lCkAmAIi0FLti8GPouYv2fOGoCjxMc3FpxyHKx2FqrQvORmcOvb92a7Hw3E1u858rUR0PVX6Oy-9TkX-VMZuMYvupnNeNdh4IEVPTKISMKAMkZZCMRGvTJkOpf9hbqc1rqzQpb8lR43AlPv5uZ3H9G9qew6N-y-tZjk-4Rjz2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Informal Data Transformation Considered Harmful</title><source>arXiv.org</source><creator>Daimler, Eric ; Wisnesky, Ryan</creator><creatorcontrib>Daimler, Eric ; Wisnesky, Ryan</creatorcontrib><description>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</description><identifier>DOI: 10.48550/arxiv.2001.00338</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Databases</subject><creationdate>2020-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2001.00338$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2001.00338$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Daimler, Eric</creatorcontrib><creatorcontrib>Wisnesky, Ryan</creatorcontrib><title>Informal Data Transformation Considered Harmful</title><description>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qwzAQBGBdcghJHiCn-gXsrLSWvD0W589g6MV3s64kMPinyElI3j6J29PAMAyfEFsJSUpaw47Dvb0lCkAmAIi0FLti8GPouYv2fOGoCjxMc3FpxyHKx2FqrQvORmcOvb92a7Hw3E1u858rUR0PVX6Oy-9TkX-VMZuMYvupnNeNdh4IEVPTKISMKAMkZZCMRGvTJkOpf9hbqc1rqzQpb8lR43AlPv5uZ3H9G9qew6N-y-tZjk-4Rjz2</recordid><startdate>20200102</startdate><enddate>20200102</enddate><creator>Daimler, Eric</creator><creator>Wisnesky, Ryan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200102</creationdate><title>Informal Data Transformation Considered Harmful</title><author>Daimler, Eric ; Wisnesky, Ryan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-d92ef5b5ef0833346b23078870382638613dd4b7315cafd156b5e2582fd8e8be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Daimler, Eric</creatorcontrib><creatorcontrib>Wisnesky, Ryan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Daimler, Eric</au><au>Wisnesky, Ryan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Informal Data Transformation Considered Harmful</atitle><date>2020-01-02</date><risdate>2020</risdate><abstract>In this paper we take the common position that AI systems are limited more by the integrity of the data they are learning from than the sophistication of their algorithms, and we take the uncommon position that the solution to achieving better data integrity in the enterprise is not to clean and validate data ex-post-facto whenever needed (the so-called data lake approach to data management, which can lead to data scientists spending 80% of their time cleaning data), but rather to formally and automatically guarantee that data integrity is preserved as it transformed (migrated, integrated, composed, queried, viewed, etc) throughout the enterprise, so that data and programs that depend on that data need not constantly be re-validated for every particular use.</abstract><doi>10.48550/arxiv.2001.00338</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2001.00338
ispartof
issn
language eng
recordid cdi_arxiv_primary_2001_00338
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Databases
title Informal Data Transformation Considered Harmful
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T13%3A51%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Informal%20Data%20Transformation%20Considered%20Harmful&rft.au=Daimler,%20Eric&rft.date=2020-01-02&rft_id=info:doi/10.48550/arxiv.2001.00338&rft_dat=%3Carxiv_GOX%3E2001_00338%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true