Fine-grained lineage for safer notebook interactions

Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy parti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2021-02, Vol.14 (6), p.1093-1101
Hauptverfasser: Macke, Stephen, Gong, Hongpu, Lee, Doris Jung-Lin, Head, Andrew, Xin, Doris, Parameswaran, Aditya
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1101
container_issue 6
container_start_page 1093
container_title Proceedings of the VLDB Endowment
container_volume 14
creator Macke, Stephen
Gong, Hongpu
Lee, Doris Jung-Lin
Head, Andrew
Xin, Doris
Parameswaran, Aditya
description Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.
doi_str_mv 10.14778/3447689.3447712
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3447689_3447712</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3447689_3447712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-37541105b63ce0c8d9b7b6051a48bad5ae73c531d667c93bf8f77f4ea6dab8b93</originalsourceid><addsrcrecordid>eNpNj7tOAzEUBS1EJEJIT-kfcLDXj-stUUQgUiQaqK3rV7QQ1sjehr8nwBY0Z0410hByK_hGKAB7J5UCY_vND0F0F2TZCc2Z5T1c_vtX5Lq1N86NNcIuidoNY2LHimdEejovHhPNpdKGOVU6lin5Ut7pME6pYpiGMrYbssh4amk9c0Vedw8v2yd2eH7cb-8PLHRKTkyCVkJw7Y0MiQcbew_ecC1QWY9RYwIZtBTRGAi99NlmgKwSmoje-l6uCP_zhlpaqym7zzp8YP1ygrvfajdXu7lafgOl-Uos</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fine-grained lineage for safer notebook interactions</title><source>ACM Digital Library</source><creator>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</creator><creatorcontrib>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</creatorcontrib><description>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3447689.3447712</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2021-02, Vol.14 (6), p.1093-1101</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-37541105b63ce0c8d9b7b6051a48bad5ae73c531d667c93bf8f77f4ea6dab8b93</citedby><cites>FETCH-LOGICAL-c243t-37541105b63ce0c8d9b7b6051a48bad5ae73c531d667c93bf8f77f4ea6dab8b93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Macke, Stephen</creatorcontrib><creatorcontrib>Gong, Hongpu</creatorcontrib><creatorcontrib>Lee, Doris Jung-Lin</creatorcontrib><creatorcontrib>Head, Andrew</creatorcontrib><creatorcontrib>Xin, Doris</creatorcontrib><creatorcontrib>Parameswaran, Aditya</creatorcontrib><title>Fine-grained lineage for safer notebook interactions</title><title>Proceedings of the VLDB Endowment</title><description>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNj7tOAzEUBS1EJEJIT-kfcLDXj-stUUQgUiQaqK3rV7QQ1sjehr8nwBY0Z0410hByK_hGKAB7J5UCY_vND0F0F2TZCc2Z5T1c_vtX5Lq1N86NNcIuidoNY2LHimdEejovHhPNpdKGOVU6lin5Ut7pME6pYpiGMrYbssh4amk9c0Vedw8v2yd2eH7cb-8PLHRKTkyCVkJw7Y0MiQcbew_ecC1QWY9RYwIZtBTRGAi99NlmgKwSmoje-l6uCP_zhlpaqym7zzp8YP1ygrvfajdXu7lafgOl-Uos</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Macke, Stephen</creator><creator>Gong, Hongpu</creator><creator>Lee, Doris Jung-Lin</creator><creator>Head, Andrew</creator><creator>Xin, Doris</creator><creator>Parameswaran, Aditya</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210201</creationdate><title>Fine-grained lineage for safer notebook interactions</title><author>Macke, Stephen ; Gong, Hongpu ; Lee, Doris Jung-Lin ; Head, Andrew ; Xin, Doris ; Parameswaran, Aditya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-37541105b63ce0c8d9b7b6051a48bad5ae73c531d667c93bf8f77f4ea6dab8b93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Macke, Stephen</creatorcontrib><creatorcontrib>Gong, Hongpu</creatorcontrib><creatorcontrib>Lee, Doris Jung-Lin</creatorcontrib><creatorcontrib>Head, Andrew</creatorcontrib><creatorcontrib>Xin, Doris</creatorcontrib><creatorcontrib>Parameswaran, Aditya</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Macke, Stephen</au><au>Gong, Hongpu</au><au>Lee, Doris Jung-Lin</au><au>Head, Andrew</au><au>Xin, Doris</au><au>Parameswaran, Aditya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fine-grained lineage for safer notebook interactions</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2021-02-01</date><risdate>2021</risdate><volume>14</volume><issue>6</issue><spage>1093</spage><epage>1101</epage><pages>1093-1101</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.</abstract><doi>10.14778/3447689.3447712</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2021-02, Vol.14 (6), p.1093-1101
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_3447689_3447712
source ACM Digital Library
title Fine-grained lineage for safer notebook interactions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T17%3A32%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fine-grained%20lineage%20for%20safer%20notebook%20interactions&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Macke,%20Stephen&rft.date=2021-02-01&rft.volume=14&rft.issue=6&rft.spage=1093&rft.epage=1101&rft.pages=1093-1101&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3447689.3447712&rft_dat=%3Ccrossref%3E10_14778_3447689_3447712%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true