Evaluating the validity of ChatGPT responses on common obstetric issues: Potential clinical applications and implications

Objective To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including “accuracy,” “completeness,” an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of gynecology and obstetrics 2024-09, Vol.166 (3), p.1127-1133
Hauptverfasser:	Peled, Tzuria, Sela, Hen Y., Weiss, Ari, Grisaru‐Granovsky, Sorina, Agrawal, Swati, Rottenstreich, Misgav
Format:	Artikel
Sprache:	eng
Schlagworte:	AI‐generated responses ChatGPT expert opinions obstetrics pregnant quality evaluation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objective To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including “accuracy,” “completeness,” and “safety.” Methods We curated 15 common and potentially clinically significant questions that pregnant women are asking. Two native English‐speaking women were asked to reframe the questions in their own words, and we employed the ChatGPT language model to generate responses to the questions. To evaluate the accuracy, completeness, and safety of the ChatGPT's generated responses, we developed a questionnaire with a scale of 1 to 5 that obstetrics and gynecology experts from different countries were invited to rate accordingly. The ratings were analyzed to evaluate the average level of agreement and percentage of positive ratings (≥4) for each criterion. Results Of the 42 experts invited, 20 responded to the questionnaire. The combined score for all responses yielded a mean rating of 4, with 75% of responses receiving a positive rating (≥4). While examining specific criteria, the ChatGPT responses were better for the accuracy criterion, with a mean rating of 4.2 and 80% of the questions received a positive rating. The responses scored less for the completeness criterion, with a mean rating of 3.8 and 46.7% of questions received a positive rating. For safety, the mean rating was 3.9 and 53.3% of questions received a positive rating. There was no response with an average negative rating below three. Conclusion This study demonstrates promising results regarding potential use of ChatGPT's in providing accurate responses to obstetric clinical questions posed by pregnant women. However, it is crucial to exercise caution when addressing inquiries concerning the safety of the fetus or the mother. Synopsis ChatGPT demonstrated the ability to provide accurate and comprehensive responses to common obstetric questions.
ISSN:	0020-7292 1879-3479 1879-3479
DOI:	10.1002/ijgo.15501