To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems

Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-09
Hauptverfasser: He, Gaole, Bharos, Abri, Gadiraju, Ujwal
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator He, Gaole
Bharos, Abri
Gadiraju, Ujwal
description Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.
doi_str_mv 10.48550/arxiv.2409.14377
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2409_14377</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108871587</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</originalsourceid><addsrcrecordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108871587</pqid></control><display><type>article</type><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><source>arXiv.org</source><source>Free E- Journals</source><creator>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creator><creatorcontrib>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creatorcontrib><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2409.14377</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computer Science - Artificial Intelligence ; Cooperation ; Debugging ; Estimation ; Feedback ; Human performance ; Intervention ; Machine learning ; Performance evaluation ; Trustworthiness</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.14377$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3648188.3675141$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><title>arXiv.org</title><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><subject>Collaboration</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Cooperation</subject><subject>Debugging</subject><subject>Estimation</subject><subject>Feedback</subject><subject>Human performance</subject><subject>Intervention</subject><subject>Machine learning</subject><subject>Performance evaluation</subject><subject>Trustworthiness</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</recordid><startdate>20240922</startdate><enddate>20240922</enddate><creator>He, Gaole</creator><creator>Bharos, Abri</creator><creator>Gadiraju, Ujwal</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240922</creationdate><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><author>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Cooperation</topic><topic>Debugging</topic><topic>Estimation</topic><topic>Feedback</topic><topic>Human performance</topic><topic>Intervention</topic><topic>Machine learning</topic><topic>Performance evaluation</topic><topic>Trustworthiness</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>He, Gaole</au><au>Bharos, Abri</au><au>Gadiraju, Ujwal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</atitle><jtitle>arXiv.org</jtitle><date>2024-09-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2409.14377</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-09
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2409_14377
source arXiv.org; Free E- Journals
subjects Collaboration
Computer Science - Artificial Intelligence
Cooperation
Debugging
Estimation
Feedback
Human performance
Intervention
Machine learning
Performance evaluation
Trustworthiness
title To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T00%3A39%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=To%20Err%20Is%20AI!%20Debugging%20as%20an%20Intervention%20to%20Facilitate%20Appropriate%20Reliance%20on%20AI%20Systems&rft.jtitle=arXiv.org&rft.au=He,%20Gaole&rft.date=2024-09-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2409.14377&rft_dat=%3Cproquest_arxiv%3E3108871587%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108871587&rft_id=info:pmid/&rfr_iscdi=true