Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows
Domain experts can play a crucial role in guiding data scientists to optimize machine learning models while ensuring contextual relevance for downstream use. However, in current workflows, such collaboration is challenging due to differing expertise, abstract documentation practices, and lack of acc...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Domain experts can play a crucial role in guiding data scientists to optimize
machine learning models while ensuring contextual relevance for downstream use.
However, in current workflows, such collaboration is challenging due to
differing expertise, abstract documentation practices, and lack of access and
visibility into low-level implementation artifacts. To address these challenges
and enable domain expert participation, we introduce CellSync, a collaboration
framework comprising (1) a Jupyter Notebook extension that continuously tracks
changes to dataframes and model metrics and (2) a Large Language Model powered
visualization dashboard that makes those changes interpretable to domain
experts. Through CellSync's cell-level dataset visualization with code
summaries, domain experts can interactively examine how individual data and
modeling operations impact different data segments. The chat features enable
data-centric conversations and targeted feedback to data scientists. Our
preliminary evaluation shows that CellSync provides transparency and promotes
critical discussions about the intents and implications of data operations. |
---|---|
DOI: | 10.48550/arxiv.2405.02260 |