Lightweight Knowledge Representations for Automating Data Analysis

The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sterbentz, Marko, Barrie, Cameron, Hooshmand, Donna, Shahi, Shubham, Dutta, Abhratanu, Pack, Harper, Zhao, Andong Li, Paley, Andrew, Einarsson, Alexander, Hammond, Kristian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Sterbentz, Marko
Barrie, Cameron
Hooshmand, Donna
Shahi, Shubham
Dutta, Abhratanu
Pack, Harper
Zhao, Andong Li
Paley, Andrew
Einarsson, Alexander
Hammond, Kristian
description The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
doi_str_mv 10.48550/arxiv.2311.12848
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_12848</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_12848</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-2347e33c75c433fc5c13d25c7d0f31f818f775d83e3d1976a8b16a233628a053</originalsourceid><addsrcrecordid>eNotz8tOwzAUBFBvWKDCB7DCP5AQ-8axWYbyKGqkSrT76OJHsJQ6lW0o_Xva0s2MZjPSIeSOVWWthKgeMP76n5IDYyXjqlbX5Knzw1fe21PSZZj2ozWDpR92F22yIWP2U0jUTZG233naHncY6DNmpG3A8ZB8uiFXDsdkby89I-vXl818UXSrt_d52xXYSFVwqKUF0FLoGsBpoRkYLrQ0lQPmFFNOSmEUWDDsUTaoPlmDHKDhCisBM3L__3o29LvotxgP_cnSny3wB6gkQ-w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Lightweight Knowledge Representations for Automating Data Analysis</title><source>arXiv.org</source><creator>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</creator><creatorcontrib>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</creatorcontrib><description>The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.</description><identifier>DOI: 10.48550/arxiv.2311.12848</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Databases</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.12848$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.12848$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sterbentz, Marko</creatorcontrib><creatorcontrib>Barrie, Cameron</creatorcontrib><creatorcontrib>Hooshmand, Donna</creatorcontrib><creatorcontrib>Shahi, Shubham</creatorcontrib><creatorcontrib>Dutta, Abhratanu</creatorcontrib><creatorcontrib>Pack, Harper</creatorcontrib><creatorcontrib>Zhao, Andong Li</creatorcontrib><creatorcontrib>Paley, Andrew</creatorcontrib><creatorcontrib>Einarsson, Alexander</creatorcontrib><creatorcontrib>Hammond, Kristian</creatorcontrib><title>Lightweight Knowledge Representations for Automating Data Analysis</title><description>The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAUBFBvWKDCB7DCP5AQ-8axWYbyKGqkSrT76OJHsJQ6lW0o_Xva0s2MZjPSIeSOVWWthKgeMP76n5IDYyXjqlbX5Knzw1fe21PSZZj2ozWDpR92F22yIWP2U0jUTZG233naHncY6DNmpG3A8ZB8uiFXDsdkby89I-vXl818UXSrt_d52xXYSFVwqKUF0FLoGsBpoRkYLrQ0lQPmFFNOSmEUWDDsUTaoPlmDHKDhCisBM3L__3o29LvotxgP_cnSny3wB6gkQ-w</recordid><startdate>20231015</startdate><enddate>20231015</enddate><creator>Sterbentz, Marko</creator><creator>Barrie, Cameron</creator><creator>Hooshmand, Donna</creator><creator>Shahi, Shubham</creator><creator>Dutta, Abhratanu</creator><creator>Pack, Harper</creator><creator>Zhao, Andong Li</creator><creator>Paley, Andrew</creator><creator>Einarsson, Alexander</creator><creator>Hammond, Kristian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231015</creationdate><title>Lightweight Knowledge Representations for Automating Data Analysis</title><author>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-2347e33c75c433fc5c13d25c7d0f31f818f775d83e3d1976a8b16a233628a053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Sterbentz, Marko</creatorcontrib><creatorcontrib>Barrie, Cameron</creatorcontrib><creatorcontrib>Hooshmand, Donna</creatorcontrib><creatorcontrib>Shahi, Shubham</creatorcontrib><creatorcontrib>Dutta, Abhratanu</creatorcontrib><creatorcontrib>Pack, Harper</creatorcontrib><creatorcontrib>Zhao, Andong Li</creatorcontrib><creatorcontrib>Paley, Andrew</creatorcontrib><creatorcontrib>Einarsson, Alexander</creatorcontrib><creatorcontrib>Hammond, Kristian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sterbentz, Marko</au><au>Barrie, Cameron</au><au>Hooshmand, Donna</au><au>Shahi, Shubham</au><au>Dutta, Abhratanu</au><au>Pack, Harper</au><au>Zhao, Andong Li</au><au>Paley, Andrew</au><au>Einarsson, Alexander</au><au>Hammond, Kristian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lightweight Knowledge Representations for Automating Data Analysis</atitle><date>2023-10-15</date><risdate>2023</risdate><abstract>The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.</abstract><doi>10.48550/arxiv.2311.12848</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2311.12848
ispartof
issn
language eng
recordid cdi_arxiv_primary_2311_12848
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Databases
title Lightweight Knowledge Representations for Automating Data Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T10%3A52%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lightweight%20Knowledge%20Representations%20for%20Automating%20Data%20Analysis&rft.au=Sterbentz,%20Marko&rft.date=2023-10-15&rft_id=info:doi/10.48550/arxiv.2311.12848&rft_dat=%3Carxiv_GOX%3E2311_12848%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true