Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution

Background This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients. Methods Twenty questions that patients are likely to ask ChatGPT were identified by brea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Cancer 2025-01, Vol.131 (1), p.e35521-n/a
Hauptverfasser:	Park, Ko Un, Lipsitz, Stuart, Dominici, Laura S., Lynce, Filipa, Minami, Christina A., Nakhlis, Faina, Waks, Adrienne G., Warren, Laura E., Eidman, Nadine, Frazier, Jeannie, Hernandez, Lourdes, Leslie, Carla, Rafte, Susan, Stroud, Delia, Weissman, Joel S., King, Tari A., Mittendorf, Elizabeth A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Breast cancer Breast Neoplasms chatbot interface generative pretrained transformer (ChatGPT) Chatbots Comprehension Correlation coefficient Correlation coefficients Female Generative artificial intelligence health care information Health Literacy Humans Likert scale Patient Education as Topic - methods Patients Readability
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Background This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients. Methods Twenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates. These were posed to ChatGPT 3.5 in July 2023 and were repeated three times. Responses were graded in two domains: accuracy (4‐point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5‐point Likert scale, 5 = not similar at all). The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts. Response readability was calculated using the Flesch Kincaid readability scale. References were requested and verified. Results The overall average accuracy was 1.88 (range 1.0–3.0; 95% confidence interval [CI], 1.42–1.94), and clinical concordance was 2.79 (range 1.0–5.0; 95% CI, 1.94–3.64). The average word count was 310 words per response (range, 146–441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59–0.91; p
ISSN:	0008-543X 1097-0142 1097-0142
DOI:	10.1002/cncr.35521