Lightweight Knowledge Representations for Automating Data Analysis
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The principal goal of data science is to derive meaningful information from
data. To do this, data scientists develop a space of analytic possibilities and
from it reach their information goals by using their knowledge of the domain,
the available data, the operations that can be performed on those data, the
algorithms/models that are fed the data, and how all of these facets
interweave. In this work, we take the first steps towards automating a key
aspect of the data science pipeline: data analysis. We present an extensible
taxonomy of data analytic operations that scopes across domains and data, as
well as a method for codifying domain-specific knowledge that links this
analytics taxonomy to actual data. We validate the functionality of our
analytics taxonomy by implementing a system that leverages it, alongside domain
labelings for 8 distinct domains, to automatically generate a space of
answerable questions and associated analytic plans. In this way, we produce
information spaces over data that enable complex analyses and search over this
data and pave the way for fully automated data analysis. |
---|---|
DOI: | 10.48550/arxiv.2311.12848 |