Mining SQL workloads for learning analysis behavior
This paper presents a set of analyses aiming at better understanding the SQLShare workload Jain et al. (2016) and learning users’ analysis behavior. SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made availab...
Gespeichert in:
Veröffentlicht in: | Information systems (Oxford) 2022-09, Vol.108, p.102004, Article 102004 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents a set of analyses aiming at better understanding the SQLShare workload Jain et al. (2016) and learning users’ analysis behavior. SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of Jain et al. (2016) , this workload is the only one containing primarily ad-hoc hand-written queries over user-uploaded datasets. In this paper we analyze this workload, by comparing users’ explorations (sequences of queries), looking for common SQL operations performed by the users during data analysis and studying query complexity. We use a clustering algorithm to retrieve groups of similar explorations and we analyze the obtained clusters through many statistical and visual indicators for explaining analysis patterns inside clusters. To our knowledge, this is the first attempt to characterize human analysis behavior in SQL workloads.
•We propose an approach for learning analysis patterns in SQL workloads.•We define a set of similarity measures tailored for SQL queries and explorations.•We cluster similar explorations using an innovative clustering process.•We use a large palette of indicators for profiling and analyzing users’ behavior.•We conduct a large experimental evaluation of the proposal over SQLShare workload. |
---|---|
ISSN: | 0306-4379 1873-6076 |
DOI: | 10.1016/j.is.2022.102004 |