Fine-Grained Lineage for Safer Notebook Interactions
Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute the...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Computational notebooks have emerged as the platform of choice for data
science and analytical workflows, enabling rapid iteration and exploration. By
keeping intermediate program state in memory and segmenting units of execution
into so-called "cells", notebooks allow users to execute their workflows
interactively and enjoy particularly tight feedback. However, as cells are
added, removed, reordered, and rerun, this hidden intermediate state
accumulates in a way that is not necessarily correlated with the notebook's
visible code, making execution behavior difficult to reason about, and leading
to errors and lack of reproducibility. We present NBSafety, a custom Jupyter
kernel that uses runtime tracing and static analysis to automatically manage
lineage associated with cell execution and global notebook state. NBSafety
detects and prevents errors that users make during unaided notebook
interactions, all while preserving the flexibility of existing notebook
semantics. We evaluate NBSafety's ability to prevent erroneous interactions by
replaying and analyzing 666 real notebook sessions. Of these, NBSafety
identified 117 sessions with potential safety errors, and in the remaining 549
sessions, the cells that NBSafety identified as resolving safety issues were
more than $7\times$ more likely to be selected by users for re-execution
compared to a random baseline, even though the users were not using NBSafety
and were therefore not influenced by its suggestions. |
---|---|
DOI: | 10.48550/arxiv.2012.06981 |