Principles for data analysis workflows

Traditional data science education often omits training on research workflows: the process that moves a scientific investigation from raw data to coherent research question to insightful contribution. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2020-07
Hauptverfasser: Stoudt, Sara, Vasquez, Valeri N, Martinez, Ciera C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Traditional data science education often omits training on research workflows: the process that moves a scientific investigation from raw data to coherent research question to insightful contribution. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining three phases: the Exploratory, Refinement, and Polishing Phases. Each workflow phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between principles for data-intensive research workflows and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students and current professionals.
ISSN:2331-8422
DOI:10.48550/arxiv.2007.08708