Mining SQL workloads for learning analysis behavior

This paper presents a set of analyses aiming at better understanding the SQLShare workload Jain et al. (2016) and learning users’ analysis behavior. SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made availab...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems (Oxford) 2022-09, Vol.108, p.102004, Article 102004
Hauptverfasser: Moreau, Clement, Legroux, Clement, Peralta, Veronika, Hamrouni, Mohamed Ali
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a set of analyses aiming at better understanding the SQLShare workload Jain et al. (2016) and learning users’ analysis behavior. SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of Jain et al. (2016) , this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. In this paper we analyze this workload, by comparing users’ explorations (sequences of queries), looking for common SQL operations performed by the users during data analysis and studying query complexity. We use a clustering algorithm to retrieve groups of similar explorations and we analyze the obtained clusters through many statistical and visual indicators for explaining analysis patterns inside clusters. To our knowledge, this is the first attempt to characterize human analysis behavior in SQL workloads. •We propose an approach for learning analysis patterns in SQL workloads.•We define a set of similarity measures tailored for SQL queries and explorations.•We cluster similar explorations using an innovative clustering process.•We use a large palette of indicators for profiling and analyzing users’ behavior.•We conduct a large experimental evaluation of the proposal over SQLShare workload.
ISSN:0306-4379
1873-6076
DOI:10.1016/j.is.2022.102004