Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine

Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ke, Yu He, Jin, Liyuan, Elangovan, Kabilan, Ong, Bryan Wen Xi, Oh, Chin Yang, Sim, Jacqueline, Loh, Kenny Wei-Tsen, Soh, Chai Rick, Cheng, Jonathan Ming Hua, Lee, Aaron Kwang Yang, Ting, Daniel Shu Wei, Liu, Nan, Abdullah, Hairil Rizal
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ke, Yu He Jin, Liyuan Elangovan, Kabilan Ong, Bryan Wen Xi Oh, Chin Yang Sim, Jacqueline Loh, Kenny Wei-Tsen Soh, Chai Rick Cheng, Jonathan Ming Hua Lee, Aaron Kwang Yang Ting, Daniel Shu Wei Liu, Nan Abdullah, Hairil Rizal
description	Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.
doi_str_mv	10.48550/arxiv.2412.18096
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_18096</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_18096</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_180963</originalsourceid><addsrcrecordid>eNqFjr0OgkAQhK-xMOoDWLmlFiAgGCwJYjDRhBh7ssqClxx35ETUB_C9xZ_CzmZmMplMPsaGtmW6vudZU9Q33piOazum7VuLeZc9doTCuCotMlhSJdS9JFkDygyiBsUFa64kqBySSHNVkW6LhiBYQxhjfVA1jJMoCOMJGAYgbFAX1KosLtiGrcpIQHj6LHOlIaHfmy1l_Mgl9VknR3Gmwdd7bLSK9mFsvIHTSvMS9T19gadv8Nn_xROUZk26</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><source>arXiv.org</source><creator>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</creator><creatorcontrib>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</creatorcontrib><description>Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.</description><identifier>DOI: 10.48550/arxiv.2412.18096</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/publicdomain/zero/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.18096$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.18096$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ke, Yu He</creatorcontrib><creatorcontrib>Jin, Liyuan</creatorcontrib><creatorcontrib>Elangovan, Kabilan</creatorcontrib><creatorcontrib>Ong, Bryan Wen Xi</creatorcontrib><creatorcontrib>Oh, Chin Yang</creatorcontrib><creatorcontrib>Sim, Jacqueline</creatorcontrib><creatorcontrib>Loh, Kenny Wei-Tsen</creatorcontrib><creatorcontrib>Soh, Chai Rick</creatorcontrib><creatorcontrib>Cheng, Jonathan Ming Hua</creatorcontrib><creatorcontrib>Lee, Aaron Kwang Yang</creatorcontrib><creatorcontrib>Ting, Daniel Shu Wei</creatorcontrib><creatorcontrib>Liu, Nan</creatorcontrib><creatorcontrib>Abdullah, Hairil Rizal</creatorcontrib><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><description>Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgkAQhK-xMOoDWLmlFiAgGCwJYjDRhBh7ssqClxx35ETUB_C9xZ_CzmZmMplMPsaGtmW6vudZU9Q33piOazum7VuLeZc9doTCuCotMlhSJdS9JFkDygyiBsUFa64kqBySSHNVkW6LhiBYQxhjfVA1jJMoCOMJGAYgbFAX1KosLtiGrcpIQHj6LHOlIaHfmy1l_Mgl9VknR3Gmwdd7bLSK9mFsvIHTSvMS9T19gadv8Nn_xROUZk26</recordid><startdate>20241223</startdate><enddate>20241223</enddate><creator>Ke, Yu He</creator><creator>Jin, Liyuan</creator><creator>Elangovan, Kabilan</creator><creator>Ong, Bryan Wen Xi</creator><creator>Oh, Chin Yang</creator><creator>Sim, Jacqueline</creator><creator>Loh, Kenny Wei-Tsen</creator><creator>Soh, Chai Rick</creator><creator>Cheng, Jonathan Ming Hua</creator><creator>Lee, Aaron Kwang Yang</creator><creator>Ting, Daniel Shu Wei</creator><creator>Liu, Nan</creator><creator>Abdullah, Hairil Rizal</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241223</creationdate><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><author>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_180963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Ke, Yu He</creatorcontrib><creatorcontrib>Jin, Liyuan</creatorcontrib><creatorcontrib>Elangovan, Kabilan</creatorcontrib><creatorcontrib>Ong, Bryan Wen Xi</creatorcontrib><creatorcontrib>Oh, Chin Yang</creatorcontrib><creatorcontrib>Sim, Jacqueline</creatorcontrib><creatorcontrib>Loh, Kenny Wei-Tsen</creatorcontrib><creatorcontrib>Soh, Chai Rick</creatorcontrib><creatorcontrib>Cheng, Jonathan Ming Hua</creatorcontrib><creatorcontrib>Lee, Aaron Kwang Yang</creatorcontrib><creatorcontrib>Ting, Daniel Shu Wei</creatorcontrib><creatorcontrib>Liu, Nan</creatorcontrib><creatorcontrib>Abdullah, Hairil Rizal</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ke, Yu He</au><au>Jin, Liyuan</au><au>Elangovan, Kabilan</au><au>Ong, Bryan Wen Xi</au><au>Oh, Chin Yang</au><au>Sim, Jacqueline</au><au>Loh, Kenny Wei-Tsen</au><au>Soh, Chai Rick</au><au>Cheng, Jonathan Ming Hua</au><au>Lee, Aaron Kwang Yang</au><au>Ting, Daniel Shu Wei</au><au>Liu, Nan</au><au>Abdullah, Hairil Rizal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</atitle><date>2024-12-23</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.</abstract><doi>10.48550/arxiv.2412.18096</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.18096
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_18096
source	arXiv.org
subjects	Computer Science - Artificial Intelligence
title	Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T18%3A58%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-world%20Deployment%20and%20Evaluation%20of%20PErioperative%20AI%20CHatbot%20(PEACH)%20--%20a%20Large%20Language%20Model%20Chatbot%20for%20Perioperative%20Medicine&rft.au=Ke,%20Yu%20He&rft.date=2024-12-23&rft_id=info:doi/10.48550/arxiv.2412.18096&rft_dat=%3Carxiv_GOX%3E2412_18096%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true