To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems

Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-09
Hauptverfasser:	He, Gaole, Bharos, Abri, Gadiraju, Ujwal
Format:	Artikel
Sprache:	eng
Schlagworte:	Collaboration Computer Science - Artificial Intelligence Cooperation Debugging Estimation Feedback Human performance Intervention Machine learning Performance evaluation Trustworthiness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	He, Gaole Bharos, Abri Gadiraju, Ujwal
description	Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.
doi_str_mv	10.48550/arxiv.2409.14377
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2409_14377</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108871587</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</originalsourceid><addsrcrecordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108871587</pqid></control><display><type>article</type><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><source>arXiv.org</source><source>Free E- Journals</source><creator>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creator><creatorcontrib>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creatorcontrib><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2409.14377</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computer Science - Artificial Intelligence ; Cooperation ; Debugging ; Estimation ; Feedback ; Human performance ; Intervention ; Machine learning ; Performance evaluation ; Trustworthiness</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.14377$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3648188.3675141$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><title>arXiv.org</title><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><subject>Collaboration</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Cooperation</subject><subject>Debugging</subject><subject>Estimation</subject><subject>Feedback</subject><subject>Human performance</subject><subject>Intervention</subject><subject>Machine learning</subject><subject>Performance evaluation</subject><subject>Trustworthiness</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</recordid><startdate>20240922</startdate><enddate>20240922</enddate><creator>He, Gaole</creator><creator>Bharos, Abri</creator><creator>Gadiraju, Ujwal</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240922</creationdate><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><author>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Cooperation</topic><topic>Debugging</topic><topic>Estimation</topic><topic>Feedback</topic><topic>Human performance</topic><topic>Intervention</topic><topic>Machine learning</topic><topic>Performance evaluation</topic><topic>Trustworthiness</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>He, Gaole</au><au>Bharos, Abri</au><au>Gadiraju, Ujwal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</atitle><jtitle>arXiv.org</jtitle><date>2024-09-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2409.14377</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-09
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2409_14377
source	arXiv.org; Free E- Journals
subjects	Collaboration Computer Science - Artificial Intelligence Cooperation Debugging Estimation Feedback Human performance Intervention Machine learning Performance evaluation Trustworthiness
title	To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T00%3A39%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=To%20Err%20Is%20AI!%20Debugging%20as%20an%20Intervention%20to%20Facilitate%20Appropriate%20Reliance%20on%20AI%20Systems&rft.jtitle=arXiv.org&rft.au=He,%20Gaole&rft.date=2024-09-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2409.14377&rft_dat=%3Cproquest_arxiv%3E3108871587%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108871587&rft_id=info:pmid/&rfr_iscdi=true