Assessing the Quality and Reliability of AI-Generated Responses to Common Hypertension Queries

The integration of artificial intelligence (AI) in healthcare, particularly through language models like ChatGPT and ChatSonic, has gained substantial attention. This article explores the utilization of these AI models to address patient queries related to hypertension, emphasizing their potential t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Curēus (Palo Alto, CA) CA), 2024-08, Vol.16 (8), p.e66041
Hauptverfasser: Vinufrancis, Aleena, Al Hussein, Hussein, Patel, Heena V, Nizami, Afshan, Singh, Aditya, Nunez, Bianca, Abdel-Aal, Aiah Mounir
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The integration of artificial intelligence (AI) in healthcare, particularly through language models like ChatGPT and ChatSonic, has gained substantial attention. This article explores the utilization of these AI models to address patient queries related to hypertension, emphasizing their potential to enhance health literacy and disease understanding. The study aims to compare the quality and reliability of responses generated by ChatGPT and ChatSonic in addressing common patient queries about hypertension and evaluate these AI models using the Global Quality Scale (GQS) and the Modified DISCERN scale. A virtual cross-sectional observational study was conducted over one month, starting in October 2023. Ten common patient queries regarding hypertension were presented to ChatGPT (https://chat.openai.com/) and ChatSonic (https://writesonic.com/chat), and the responses were recorded. Two internal medicine physicians assessed the responses using the GQS and the Modified DISCERN scale. Statistical analysis included Cohen's Kappa values for inter-rater agreement. The study evaluated responses from ChatGPT and ChatSonic for 10 patient queries. Assessors observed variations in the quality and reliability assessments between the two AI models. Cohen's Kappa values indicated minimal agreement between the evaluators for both the GQS and Modified DISCERN scale. This study highlights the variations in the assessment of responses generated by ChatGPT and ChatSonic for hypertension-related queries. The findings highlight the need for ongoing monitoring and fact-checking of AI-generated responses.
ISSN:2168-8184
2168-8184
DOI:10.7759/cureus.66041