Lightweight Knowledge Representations for Automating Data Analysis
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Sterbentz, Marko Barrie, Cameron Hooshmand, Donna Shahi, Shubham Dutta, Abhratanu Pack, Harper Zhao, Andong Li Paley, Andrew Einarsson, Alexander Hammond, Kristian |
description | The principal goal of data science is to derive meaningful information from
data. To do this, data scientists develop a space of analytic possibilities and
from it reach their information goals by using their knowledge of the domain,
the available data, the operations that can be performed on those data, the
algorithms/models that are fed the data, and how all of these facets
interweave. In this work, we take the first steps towards automating a key
aspect of the data science pipeline: data analysis. We present an extensible
taxonomy of data analytic operations that scopes across domains and data, as
well as a method for codifying domain-specific knowledge that links this
analytics taxonomy to actual data. We validate the functionality of our
analytics taxonomy by implementing a system that leverages it, alongside domain
labelings for 8 distinct domains, to automatically generate a space of
answerable questions and associated analytic plans. In this way, we produce
information spaces over data that enable complex analyses and search over this
data and pave the way for fully automated data analysis. |
doi_str_mv | 10.48550/arxiv.2311.12848 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_12848</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_12848</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-2347e33c75c433fc5c13d25c7d0f31f818f775d83e3d1976a8b16a233628a053</originalsourceid><addsrcrecordid>eNotz8tOwzAUBFBvWKDCB7DCP5AQ-8axWYbyKGqkSrT76OJHsJQ6lW0o_Xva0s2MZjPSIeSOVWWthKgeMP76n5IDYyXjqlbX5Knzw1fe21PSZZj2ozWDpR92F22yIWP2U0jUTZG233naHncY6DNmpG3A8ZB8uiFXDsdkby89I-vXl818UXSrt_d52xXYSFVwqKUF0FLoGsBpoRkYLrQ0lQPmFFNOSmEUWDDsUTaoPlmDHKDhCisBM3L__3o29LvotxgP_cnSny3wB6gkQ-w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Lightweight Knowledge Representations for Automating Data Analysis</title><source>arXiv.org</source><creator>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</creator><creatorcontrib>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</creatorcontrib><description>The principal goal of data science is to derive meaningful information from
data. To do this, data scientists develop a space of analytic possibilities and
from it reach their information goals by using their knowledge of the domain,
the available data, the operations that can be performed on those data, the
algorithms/models that are fed the data, and how all of these facets
interweave. In this work, we take the first steps towards automating a key
aspect of the data science pipeline: data analysis. We present an extensible
taxonomy of data analytic operations that scopes across domains and data, as
well as a method for codifying domain-specific knowledge that links this
analytics taxonomy to actual data. We validate the functionality of our
analytics taxonomy by implementing a system that leverages it, alongside domain
labelings for 8 distinct domains, to automatically generate a space of
answerable questions and associated analytic plans. In this way, we produce
information spaces over data that enable complex analyses and search over this
data and pave the way for fully automated data analysis.</description><identifier>DOI: 10.48550/arxiv.2311.12848</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Databases</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.12848$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.12848$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Sterbentz, Marko</creatorcontrib><creatorcontrib>Barrie, Cameron</creatorcontrib><creatorcontrib>Hooshmand, Donna</creatorcontrib><creatorcontrib>Shahi, Shubham</creatorcontrib><creatorcontrib>Dutta, Abhratanu</creatorcontrib><creatorcontrib>Pack, Harper</creatorcontrib><creatorcontrib>Zhao, Andong Li</creatorcontrib><creatorcontrib>Paley, Andrew</creatorcontrib><creatorcontrib>Einarsson, Alexander</creatorcontrib><creatorcontrib>Hammond, Kristian</creatorcontrib><title>Lightweight Knowledge Representations for Automating Data Analysis</title><description>The principal goal of data science is to derive meaningful information from
data. To do this, data scientists develop a space of analytic possibilities and
from it reach their information goals by using their knowledge of the domain,
the available data, the operations that can be performed on those data, the
algorithms/models that are fed the data, and how all of these facets
interweave. In this work, we take the first steps towards automating a key
aspect of the data science pipeline: data analysis. We present an extensible
taxonomy of data analytic operations that scopes across domains and data, as
well as a method for codifying domain-specific knowledge that links this
analytics taxonomy to actual data. We validate the functionality of our
analytics taxonomy by implementing a system that leverages it, alongside domain
labelings for 8 distinct domains, to automatically generate a space of
answerable questions and associated analytic plans. In this way, we produce
information spaces over data that enable complex analyses and search over this
data and pave the way for fully automated data analysis.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8tOwzAUBFBvWKDCB7DCP5AQ-8axWYbyKGqkSrT76OJHsJQ6lW0o_Xva0s2MZjPSIeSOVWWthKgeMP76n5IDYyXjqlbX5Knzw1fe21PSZZj2ozWDpR92F22yIWP2U0jUTZG233naHncY6DNmpG3A8ZB8uiFXDsdkby89I-vXl818UXSrt_d52xXYSFVwqKUF0FLoGsBpoRkYLrQ0lQPmFFNOSmEUWDDsUTaoPlmDHKDhCisBM3L__3o29LvotxgP_cnSny3wB6gkQ-w</recordid><startdate>20231015</startdate><enddate>20231015</enddate><creator>Sterbentz, Marko</creator><creator>Barrie, Cameron</creator><creator>Hooshmand, Donna</creator><creator>Shahi, Shubham</creator><creator>Dutta, Abhratanu</creator><creator>Pack, Harper</creator><creator>Zhao, Andong Li</creator><creator>Paley, Andrew</creator><creator>Einarsson, Alexander</creator><creator>Hammond, Kristian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231015</creationdate><title>Lightweight Knowledge Representations for Automating Data Analysis</title><author>Sterbentz, Marko ; Barrie, Cameron ; Hooshmand, Donna ; Shahi, Shubham ; Dutta, Abhratanu ; Pack, Harper ; Zhao, Andong Li ; Paley, Andrew ; Einarsson, Alexander ; Hammond, Kristian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-2347e33c75c433fc5c13d25c7d0f31f818f775d83e3d1976a8b16a233628a053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Sterbentz, Marko</creatorcontrib><creatorcontrib>Barrie, Cameron</creatorcontrib><creatorcontrib>Hooshmand, Donna</creatorcontrib><creatorcontrib>Shahi, Shubham</creatorcontrib><creatorcontrib>Dutta, Abhratanu</creatorcontrib><creatorcontrib>Pack, Harper</creatorcontrib><creatorcontrib>Zhao, Andong Li</creatorcontrib><creatorcontrib>Paley, Andrew</creatorcontrib><creatorcontrib>Einarsson, Alexander</creatorcontrib><creatorcontrib>Hammond, Kristian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sterbentz, Marko</au><au>Barrie, Cameron</au><au>Hooshmand, Donna</au><au>Shahi, Shubham</au><au>Dutta, Abhratanu</au><au>Pack, Harper</au><au>Zhao, Andong Li</au><au>Paley, Andrew</au><au>Einarsson, Alexander</au><au>Hammond, Kristian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lightweight Knowledge Representations for Automating Data Analysis</atitle><date>2023-10-15</date><risdate>2023</risdate><abstract>The principal goal of data science is to derive meaningful information from
data. To do this, data scientists develop a space of analytic possibilities and
from it reach their information goals by using their knowledge of the domain,
the available data, the operations that can be performed on those data, the
algorithms/models that are fed the data, and how all of these facets
interweave. In this work, we take the first steps towards automating a key
aspect of the data science pipeline: data analysis. We present an extensible
taxonomy of data analytic operations that scopes across domains and data, as
well as a method for codifying domain-specific knowledge that links this
analytics taxonomy to actual data. We validate the functionality of our
analytics taxonomy by implementing a system that leverages it, alongside domain
labelings for 8 distinct domains, to automatically generate a space of
answerable questions and associated analytic plans. In this way, we produce
information spaces over data that enable complex analyses and search over this
data and pave the way for fully automated data analysis.</abstract><doi>10.48550/arxiv.2311.12848</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2311.12848 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2311_12848 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Databases |
title | Lightweight Knowledge Representations for Automating Data Analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T10%3A52%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lightweight%20Knowledge%20Representations%20for%20Automating%20Data%20Analysis&rft.au=Sterbentz,%20Marko&rft.date=2023-10-15&rft_id=info:doi/10.48550/arxiv.2311.12848&rft_dat=%3Carxiv_GOX%3E2311_12848%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |