Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparenc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Heger, Amy K, Marquis, Liz B, Vorvoreanu, Mihaela, Wallach, Hanna, Vaughan, Jennifer Wortman
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Heger, Amy K Marquis, Liz B Vorvoreanu, Mihaela Wallach, Hanna Vaughan, Jennifer Wortman
description	Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.
doi_str_mv	10.48550/arxiv.2206.02923
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2206_02923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2206_02923</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-5a9cc5ccb7bb610c288703f26d29efc0798e4e36e81c9a2dbb137ff6bf1ba9f13</originalsourceid><addsrcrecordid>eNotjztPxDAQhN1QoIMfQIU7GhL8uDhxiRJeUoArjjpaO-s7Sznn5AQE_x7noJrRaHZWHyFXnOXrqijYHcRv_5ULwVTOhBbynIwfocc4zRB6H3b0FezeB6QtQgxLsIlgZz_7MaTWDW1gBtqM9vOAYYYlphuMFo-LnW7pG2KfpN7DMGDYYfJpmTY4-fQmHV-QMwfDhJf_uiLbx4dt_Zy1708v9X2bgSplVoC2trDWlMYozqyoqpJJJ1QvNDrLSl3hGqXCilsNojeGy9I5ZRw3oB2XK3L9N3si7o7RHyD-dAt5dyKXv83JVd0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata</title><source>arXiv.org</source><creator>Heger, Amy K ; Marquis, Liz B ; Vorvoreanu, Mihaela ; Wallach, Hanna ; Vaughan, Jennifer Wortman</creator><creatorcontrib>Heger, Amy K ; Marquis, Liz B ; Vorvoreanu, Mihaela ; Wallach, Hanna ; Vaughan, Jennifer Wortman</creatorcontrib><description>Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.</description><identifier>DOI: 10.48550/arxiv.2206.02923</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Human-Computer Interaction</subject><creationdate>2022-06</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2206.02923$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2206.02923$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Heger, Amy K</creatorcontrib><creatorcontrib>Marquis, Liz B</creatorcontrib><creatorcontrib>Vorvoreanu, Mihaela</creatorcontrib><creatorcontrib>Wallach, Hanna</creatorcontrib><creatorcontrib>Vaughan, Jennifer Wortman</creatorcontrib><title>Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata</title><description>Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Human-Computer Interaction</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztPxDAQhN1QoIMfQIU7GhL8uDhxiRJeUoArjjpaO-s7Sznn5AQE_x7noJrRaHZWHyFXnOXrqijYHcRv_5ULwVTOhBbynIwfocc4zRB6H3b0FezeB6QtQgxLsIlgZz_7MaTWDW1gBtqM9vOAYYYlphuMFo-LnW7pG2KfpN7DMGDYYfJpmTY4-fQmHV-QMwfDhJf_uiLbx4dt_Zy1708v9X2bgSplVoC2trDWlMYozqyoqpJJJ1QvNDrLSl3hGqXCilsNojeGy9I5ZRw3oB2XK3L9N3si7o7RHyD-dAt5dyKXv83JVd0</recordid><startdate>20220606</startdate><enddate>20220606</enddate><creator>Heger, Amy K</creator><creator>Marquis, Liz B</creator><creator>Vorvoreanu, Mihaela</creator><creator>Wallach, Hanna</creator><creator>Vaughan, Jennifer Wortman</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220606</creationdate><title>Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata</title><author>Heger, Amy K ; Marquis, Liz B ; Vorvoreanu, Mihaela ; Wallach, Hanna ; Vaughan, Jennifer Wortman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-5a9cc5ccb7bb610c288703f26d29efc0798e4e36e81c9a2dbb137ff6bf1ba9f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Human-Computer Interaction</topic><toplevel>online_resources</toplevel><creatorcontrib>Heger, Amy K</creatorcontrib><creatorcontrib>Marquis, Liz B</creatorcontrib><creatorcontrib>Vorvoreanu, Mihaela</creatorcontrib><creatorcontrib>Wallach, Hanna</creatorcontrib><creatorcontrib>Vaughan, Jennifer Wortman</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heger, Amy K</au><au>Marquis, Liz B</au><au>Vorvoreanu, Mihaela</au><au>Wallach, Hanna</au><au>Vaughan, Jennifer Wortman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata</atitle><date>2022-06-06</date><risdate>2022</risdate><abstract>Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.</abstract><doi>10.48550/arxiv.2206.02923</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2206.02923
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2206_02923
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction
title	Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T00%3A29%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Understanding%20Machine%20Learning%20Practitioners'%20Data%20Documentation%20Perceptions,%20Needs,%20Challenges,%20and%20Desiderata&rft.au=Heger,%20Amy%20K&rft.date=2022-06-06&rft_id=info:doi/10.48550/arxiv.2206.02923&rft_dat=%3Carxiv_GOX%3E2206_02923%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true