Fine-Grained Lineage for Safer Notebook Interactions

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Macke, Stephen, Gong, Hongpu, Lee, Doris Jung-Lin, Head, Andrew, Xin, Doris, Parameswaran, Aditya
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Databases Computer Science - Human-Computer Interaction Computer Science - Programming Languages Computer Science - Software Engineering
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Macke, Stephen Gong, Hongpu Lee, Doris Jung-Lin Head, Andrew Xin, Doris Parameswaran, Aditya
description	Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebook's visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7\times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.
doi_str_mv	10.48550/arxiv.2012.06981
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_06981</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_06981</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-a0356bec3f3545f701333b77cdc0e9c400f9cd1d71a7e230c43f9d8b13e4653d3</originalsourceid><addsrcrecordid>eNotzr1uwjAUhmEvHRD0ApjqG0g4zrHjZEQIKFIEQ9mjE_u4igpxZaKqvfvyt3zv9ukRYq4g15UxsKD02__kBagih7Ku1EToTT9wtk10jZfNdemTZYhJflDgJPdx5C7GL7kbRk7kxj4Ol5l4CXS68OuzU3HcrI-r96w5bHerZZNRaVVGgKbs2GFAo02woBCxs9Z5B1w7DRBq55W3iiwXCE5jqH3VKWRdGvQ4FW-P2zu7_U79mdJfe-O3dz7-AwbfPs4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fine-Grained Lineage for Safer Notebook Interactions</title><source>arXiv.org</source><creator>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</creator><creatorcontrib>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</creatorcontrib><description>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebook's visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7\times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.</description><identifier>DOI: 10.48550/arxiv.2012.06981</identifier><language>eng</language><subject>Computer Science - Databases ; Computer Science - Human-Computer Interaction ; Computer Science - Programming Languages ; Computer Science - Software Engineering</subject><creationdate>2020-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.06981$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.06981$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Macke, Stephen</creatorcontrib><creatorcontrib>Gong, Hongpu</creatorcontrib><creatorcontrib>Lee, Doris Jung-Lin</creatorcontrib><creatorcontrib>Head, Andrew</creatorcontrib><creatorcontrib>Xin, Doris</creatorcontrib><creatorcontrib>Parameswaran, Aditya</creatorcontrib><title>Fine-Grained Lineage for Safer Notebook Interactions</title><description>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebook's visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7\times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.</description><subject>Computer Science - Databases</subject><subject>Computer Science - Human-Computer Interaction</subject><subject>Computer Science - Programming Languages</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzr1uwjAUhmEvHRD0ApjqG0g4zrHjZEQIKFIEQ9mjE_u4igpxZaKqvfvyt3zv9ukRYq4g15UxsKD02__kBagih7Ku1EToTT9wtk10jZfNdemTZYhJflDgJPdx5C7GL7kbRk7kxj4Ol5l4CXS68OuzU3HcrI-r96w5bHerZZNRaVVGgKbs2GFAo02woBCxs9Z5B1w7DRBq55W3iiwXCE5jqH3VKWRdGvQ4FW-P2zu7_U79mdJfe-O3dz7-AwbfPs4</recordid><startdate>20201213</startdate><enddate>20201213</enddate><creator>Macke, Stephen</creator><creator>Gong, Hongpu</creator><creator>Lee, Doris Jung-Lin</creator><creator>Head, Andrew</creator><creator>Xin, Doris</creator><creator>Parameswaran, Aditya</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201213</creationdate><title>Fine-Grained Lineage for Safer Notebook Interactions</title><author>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-a0356bec3f3545f701333b77cdc0e9c400f9cd1d71a7e230c43f9d8b13e4653d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Databases</topic><topic>Computer Science - Human-Computer Interaction</topic><topic>Computer Science - Programming Languages</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Macke, Stephen</creatorcontrib><creatorcontrib>Gong, Hongpu</creatorcontrib><creatorcontrib>Lee, Doris Jung-Lin</creatorcontrib><creatorcontrib>Head, Andrew</creatorcontrib><creatorcontrib>Xin, Doris</creatorcontrib><creatorcontrib>Parameswaran, Aditya</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Macke, Stephen</au><au>Gong, Hongpu</au><au>Lee, Doris Jung-Lin</au><au>Head, Andrew</au><au>Xin, Doris</au><au>Parameswaran, Aditya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fine-Grained Lineage for Safer Notebook Interactions</atitle><date>2020-12-13</date><risdate>2020</risdate><abstract>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebook's visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7\times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.</abstract><doi>10.48550/arxiv.2012.06981</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2012.06981
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2012_06981
source	arXiv.org
subjects	Computer Science - Databases Computer Science - Human-Computer Interaction Computer Science - Programming Languages Computer Science - Software Engineering
title	Fine-Grained Lineage for Safer Notebook Interactions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T17%3A36%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fine-Grained%20Lineage%20for%20Safer%20Notebook%20Interactions&rft.au=Macke,%20Stephen&rft.date=2020-12-13&rft_id=info:doi/10.48550/arxiv.2012.06981&rft_dat=%3Carxiv_GOX%3E2012_06981%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true