Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata
Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparenc...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data is central to the development and evaluation of machine learning (ML)
models. However, the use of problematic or inappropriate datasets can result in
harms when the resulting models are deployed. To encourage responsible AI
practice through more deliberate reflection on datasets and transparency around
the processes by which they are created, researchers and practitioners have
begun to advocate for increased data documentation and have proposed several
data documentation frameworks. However, there is little research on whether
these data documentation frameworks meet the needs of ML practitioners, who
both create and consume datasets. To address this gap, we set out to understand
ML practitioners' data documentation perceptions, needs, challenges, and
desiderata, with the goal of deriving design requirements that can inform
future data documentation frameworks. We conducted a series of semi-structured
interviews with 14 ML practitioners at a single large, international technology
company. We had them answer a list of questions taken from datasheets for
datasets (Gebru, 2021). Our findings show that current approaches to data
documentation are largely ad hoc and myopic in nature. Participants expressed
needs for data documentation frameworks to be adaptable to their contexts,
integrated into their existing tools and workflows, and automated wherever
possible. Despite the fact that data documentation frameworks are often
motivated from the perspective of responsible AI, participants did not make the
connection between the questions that they were asked to answer and their
responsible AI implications. In addition, participants often had difficulties
prioritizing the needs of dataset consumers and providing information that
someone unfamiliar with their datasets might need to know. Based on these
findings, we derive seven design requirements for future data documentation
frameworks. |
---|---|
DOI: | 10.48550/arxiv.2206.02923 |