To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems
Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthin...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-09 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | He, Gaole Bharos, Abri Gadiraju, Ujwal |
description | Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration. |
doi_str_mv | 10.48550/arxiv.2409.14377 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2409_14377</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108871587</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</originalsourceid><addsrcrecordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108871587</pqid></control><display><type>article</type><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><source>arXiv.org</source><source>Free E- Journals</source><creator>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creator><creatorcontrib>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</creatorcontrib><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2409.14377</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Collaboration ; Computer Science - Artificial Intelligence ; Cooperation ; Debugging ; Estimation ; Feedback ; Human performance ; Intervention ; Machine learning ; Performance evaluation ; Trustworthiness</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.14377$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3648188.3675141$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><title>arXiv.org</title><description>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</description><subject>Collaboration</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Cooperation</subject><subject>Debugging</subject><subject>Estimation</subject><subject>Feedback</subject><subject>Human performance</subject><subject>Intervention</subject><subject>Machine learning</subject><subject>Performance evaluation</subject><subject>Trustworthiness</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1PAjEURRsTEwnyA1xZ43qwn7SznCDoJCQmys7F5DHzhpRAB9tC5N87gKt3Fyc3511CHjgbK6s1e4Hw645joVg-5koac0MGQkqeWSXEHRnFuGGMiYkRWssB-V52dBYCLSMtyif6iqvDeu38mkKk4GnpE4Yj-uQ6T1NH51C7rUuQkBb7fej2wZ3zJ24d-BppTxUl_TrFhLt4T25b2EYc_d8hWc5ny-l7tvh4K6fFIgMtTKZEjnVjNMdGq5rrNjfKgpiIFSBrAWDCcq4tslWNvJW1zFXDWMNyzUEKK-SQPF5rL59XvdIOwqk6L1BdFuiJ5yvRG_8cMKZq0x2C750qyZm1pu838g9yeVzs</recordid><startdate>20240922</startdate><enddate>20240922</enddate><creator>He, Gaole</creator><creator>Bharos, Abri</creator><creator>Gadiraju, Ujwal</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240922</creationdate><title>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</title><author>He, Gaole ; Bharos, Abri ; Gadiraju, Ujwal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-429ecd751ed54c15f9748a262bae0faaa609158e0bce1f3c394d00d0951a32823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collaboration</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Cooperation</topic><topic>Debugging</topic><topic>Estimation</topic><topic>Feedback</topic><topic>Human performance</topic><topic>Intervention</topic><topic>Machine learning</topic><topic>Performance evaluation</topic><topic>Trustworthiness</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Gaole</creatorcontrib><creatorcontrib>Bharos, Abri</creatorcontrib><creatorcontrib>Gadiraju, Ujwal</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>He, Gaole</au><au>Bharos, Abri</au><au>Gadiraju, Ujwal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems</atitle><jtitle>arXiv.org</jtitle><date>2024-09-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2409.14377</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2409_14377 |
source | arXiv.org; Free E- Journals |
subjects | Collaboration Computer Science - Artificial Intelligence Cooperation Debugging Estimation Feedback Human performance Intervention Machine learning Performance evaluation Trustworthiness |
title | To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T00%3A39%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=To%20Err%20Is%20AI!%20Debugging%20as%20an%20Intervention%20to%20Facilitate%20Appropriate%20Reliance%20on%20AI%20Systems&rft.jtitle=arXiv.org&rft.au=He,%20Gaole&rft.date=2024-09-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2409.14377&rft_dat=%3Cproquest_arxiv%3E3108871587%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108871587&rft_id=info:pmid/&rfr_iscdi=true |