Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine
Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ke, Yu He Jin, Liyuan Elangovan, Kabilan Ong, Bryan Wen Xi Oh, Chin Yang Sim, Jacqueline Loh, Kenny Wei-Tsen Soh, Chai Rick Cheng, Jonathan Ming Hua Lee, Aaron Kwang Yang Ting, Daniel Shu Wei Liu, Nan Abdullah, Hairil Rizal |
description | Large Language Models (LLMs) are emerging as powerful tools in healthcare,
particularly for complex, domain-specific tasks. This study describes the
development and evaluation of the PErioperative AI CHatbot (PEACH), a secure
LLM-based system integrated with local perioperative guidelines to support
preoperative clinical decision-making. PEACH was embedded with 35 institutional
perioperative protocols in the secure Claude 3.5 Sonet LLM framework within
Pair Chat (developed by Singapore Government) and tested in a silent deployment
with real-world data. Accuracy, safety, and usability were assessed. Deviations
and hallucinations were categorized based on potential harm, and user feedback
was evaluated using the Technology Acceptance Model (TAM). Updates were made
after the initial silent deployment to amend one protocol.
In 240 real-world clinical iterations, PEACH achieved a first-generation
accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across
three iterations. The updated PEACH demonstrated improved accuracy of 97.9%
(235/240), with a statistically significant difference from the null hypothesis
of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and
deviations were observed (both 1/240 and 2/240, respectively). Clinicians
reported that PEACH expedited decisions in 95% of cases, and inter-rater
reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among
attendings.
PEACH is an accurate, adaptable tool that enhances consistency and efficiency
in perioperative decision-making. Future research should explore its
scalability across specialties and its impact on clinical outcomes. |
doi_str_mv | 10.48550/arxiv.2412.18096 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_18096</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_18096</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_180963</originalsourceid><addsrcrecordid>eNqFjr0OgkAQhK-xMOoDWLmlFiAgGCwJYjDRhBh7ssqClxx35ETUB_C9xZ_CzmZmMplMPsaGtmW6vudZU9Q33piOazum7VuLeZc9doTCuCotMlhSJdS9JFkDygyiBsUFa64kqBySSHNVkW6LhiBYQxhjfVA1jJMoCOMJGAYgbFAX1KosLtiGrcpIQHj6LHOlIaHfmy1l_Mgl9VknR3Gmwdd7bLSK9mFsvIHTSvMS9T19gadv8Nn_xROUZk26</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><source>arXiv.org</source><creator>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</creator><creatorcontrib>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</creatorcontrib><description>Large Language Models (LLMs) are emerging as powerful tools in healthcare,
particularly for complex, domain-specific tasks. This study describes the
development and evaluation of the PErioperative AI CHatbot (PEACH), a secure
LLM-based system integrated with local perioperative guidelines to support
preoperative clinical decision-making. PEACH was embedded with 35 institutional
perioperative protocols in the secure Claude 3.5 Sonet LLM framework within
Pair Chat (developed by Singapore Government) and tested in a silent deployment
with real-world data. Accuracy, safety, and usability were assessed. Deviations
and hallucinations were categorized based on potential harm, and user feedback
was evaluated using the Technology Acceptance Model (TAM). Updates were made
after the initial silent deployment to amend one protocol.
In 240 real-world clinical iterations, PEACH achieved a first-generation
accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across
three iterations. The updated PEACH demonstrated improved accuracy of 97.9%
(235/240), with a statistically significant difference from the null hypothesis
of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and
deviations were observed (both 1/240 and 2/240, respectively). Clinicians
reported that PEACH expedited decisions in 95% of cases, and inter-rater
reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among
attendings.
PEACH is an accurate, adaptable tool that enhances consistency and efficiency
in perioperative decision-making. Future research should explore its
scalability across specialties and its impact on clinical outcomes.</description><identifier>DOI: 10.48550/arxiv.2412.18096</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/publicdomain/zero/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.18096$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.18096$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ke, Yu He</creatorcontrib><creatorcontrib>Jin, Liyuan</creatorcontrib><creatorcontrib>Elangovan, Kabilan</creatorcontrib><creatorcontrib>Ong, Bryan Wen Xi</creatorcontrib><creatorcontrib>Oh, Chin Yang</creatorcontrib><creatorcontrib>Sim, Jacqueline</creatorcontrib><creatorcontrib>Loh, Kenny Wei-Tsen</creatorcontrib><creatorcontrib>Soh, Chai Rick</creatorcontrib><creatorcontrib>Cheng, Jonathan Ming Hua</creatorcontrib><creatorcontrib>Lee, Aaron Kwang Yang</creatorcontrib><creatorcontrib>Ting, Daniel Shu Wei</creatorcontrib><creatorcontrib>Liu, Nan</creatorcontrib><creatorcontrib>Abdullah, Hairil Rizal</creatorcontrib><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><description>Large Language Models (LLMs) are emerging as powerful tools in healthcare,
particularly for complex, domain-specific tasks. This study describes the
development and evaluation of the PErioperative AI CHatbot (PEACH), a secure
LLM-based system integrated with local perioperative guidelines to support
preoperative clinical decision-making. PEACH was embedded with 35 institutional
perioperative protocols in the secure Claude 3.5 Sonet LLM framework within
Pair Chat (developed by Singapore Government) and tested in a silent deployment
with real-world data. Accuracy, safety, and usability were assessed. Deviations
and hallucinations were categorized based on potential harm, and user feedback
was evaluated using the Technology Acceptance Model (TAM). Updates were made
after the initial silent deployment to amend one protocol.
In 240 real-world clinical iterations, PEACH achieved a first-generation
accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across
three iterations. The updated PEACH demonstrated improved accuracy of 97.9%
(235/240), with a statistically significant difference from the null hypothesis
of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and
deviations were observed (both 1/240 and 2/240, respectively). Clinicians
reported that PEACH expedited decisions in 95% of cases, and inter-rater
reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among
attendings.
PEACH is an accurate, adaptable tool that enhances consistency and efficiency
in perioperative decision-making. Future research should explore its
scalability across specialties and its impact on clinical outcomes.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgkAQhK-xMOoDWLmlFiAgGCwJYjDRhBh7ssqClxx35ETUB_C9xZ_CzmZmMplMPsaGtmW6vudZU9Q33piOazum7VuLeZc9doTCuCotMlhSJdS9JFkDygyiBsUFa64kqBySSHNVkW6LhiBYQxhjfVA1jJMoCOMJGAYgbFAX1KosLtiGrcpIQHj6LHOlIaHfmy1l_Mgl9VknR3Gmwdd7bLSK9mFsvIHTSvMS9T19gadv8Nn_xROUZk26</recordid><startdate>20241223</startdate><enddate>20241223</enddate><creator>Ke, Yu He</creator><creator>Jin, Liyuan</creator><creator>Elangovan, Kabilan</creator><creator>Ong, Bryan Wen Xi</creator><creator>Oh, Chin Yang</creator><creator>Sim, Jacqueline</creator><creator>Loh, Kenny Wei-Tsen</creator><creator>Soh, Chai Rick</creator><creator>Cheng, Jonathan Ming Hua</creator><creator>Lee, Aaron Kwang Yang</creator><creator>Ting, Daniel Shu Wei</creator><creator>Liu, Nan</creator><creator>Abdullah, Hairil Rizal</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241223</creationdate><title>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</title><author>Ke, Yu He ; Jin, Liyuan ; Elangovan, Kabilan ; Ong, Bryan Wen Xi ; Oh, Chin Yang ; Sim, Jacqueline ; Loh, Kenny Wei-Tsen ; Soh, Chai Rick ; Cheng, Jonathan Ming Hua ; Lee, Aaron Kwang Yang ; Ting, Daniel Shu Wei ; Liu, Nan ; Abdullah, Hairil Rizal</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_180963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Ke, Yu He</creatorcontrib><creatorcontrib>Jin, Liyuan</creatorcontrib><creatorcontrib>Elangovan, Kabilan</creatorcontrib><creatorcontrib>Ong, Bryan Wen Xi</creatorcontrib><creatorcontrib>Oh, Chin Yang</creatorcontrib><creatorcontrib>Sim, Jacqueline</creatorcontrib><creatorcontrib>Loh, Kenny Wei-Tsen</creatorcontrib><creatorcontrib>Soh, Chai Rick</creatorcontrib><creatorcontrib>Cheng, Jonathan Ming Hua</creatorcontrib><creatorcontrib>Lee, Aaron Kwang Yang</creatorcontrib><creatorcontrib>Ting, Daniel Shu Wei</creatorcontrib><creatorcontrib>Liu, Nan</creatorcontrib><creatorcontrib>Abdullah, Hairil Rizal</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ke, Yu He</au><au>Jin, Liyuan</au><au>Elangovan, Kabilan</au><au>Ong, Bryan Wen Xi</au><au>Oh, Chin Yang</au><au>Sim, Jacqueline</au><au>Loh, Kenny Wei-Tsen</au><au>Soh, Chai Rick</au><au>Cheng, Jonathan Ming Hua</au><au>Lee, Aaron Kwang Yang</au><au>Ting, Daniel Shu Wei</au><au>Liu, Nan</au><au>Abdullah, Hairil Rizal</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine</atitle><date>2024-12-23</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) are emerging as powerful tools in healthcare,
particularly for complex, domain-specific tasks. This study describes the
development and evaluation of the PErioperative AI CHatbot (PEACH), a secure
LLM-based system integrated with local perioperative guidelines to support
preoperative clinical decision-making. PEACH was embedded with 35 institutional
perioperative protocols in the secure Claude 3.5 Sonet LLM framework within
Pair Chat (developed by Singapore Government) and tested in a silent deployment
with real-world data. Accuracy, safety, and usability were assessed. Deviations
and hallucinations were categorized based on potential harm, and user feedback
was evaluated using the Technology Acceptance Model (TAM). Updates were made
after the initial silent deployment to amend one protocol.
In 240 real-world clinical iterations, PEACH achieved a first-generation
accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across
three iterations. The updated PEACH demonstrated improved accuracy of 97.9%
(235/240), with a statistically significant difference from the null hypothesis
of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and
deviations were observed (both 1/240 and 2/240, respectively). Clinicians
reported that PEACH expedited decisions in 95% of cases, and inter-rater
reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among
attendings.
PEACH is an accurate, adaptable tool that enhances consistency and efficiency
in perioperative decision-making. Future research should explore its
scalability across specialties and its impact on clinical outcomes.</abstract><doi>10.48550/arxiv.2412.18096</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.18096 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_18096 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence |
title | Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T18%3A58%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-world%20Deployment%20and%20Evaluation%20of%20PErioperative%20AI%20CHatbot%20(PEACH)%20--%20a%20Large%20Language%20Model%20Chatbot%20for%20Perioperative%20Medicine&rft.au=Ke,%20Yu%20He&rft.date=2024-12-23&rft_id=info:doi/10.48550/arxiv.2412.18096&rft_dat=%3Carxiv_GOX%3E2412_18096%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |