Evaluating ChatGPT’s moral competence in health care-related ethical problems

Objectives Artificial intelligence tools such as Chat Generative Pre-trained Transformer (ChatGPT) have been used for many health care-related applications; however, there is a lack of research on their capabilities for evaluating morally and/or ethically complex medical decisions. The objective of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:JAMIA open 2024-10, Vol.7 (3), p.ooae065
Hauptverfasser: Rashid, Ahmed A, Skelly, Ryan A, Valdes, Carlos A, Patel, Pruthvi P, Solberg, Lauren B, Giordano, Christopher R, Modave, François
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Objectives Artificial intelligence tools such as Chat Generative Pre-trained Transformer (ChatGPT) have been used for many health care-related applications; however, there is a lack of research on their capabilities for evaluating morally and/or ethically complex medical decisions. The objective of this study was to assess the moral competence of ChatGPT. Materials and methods This cross-sectional study was performed between May 2023 and July 2023 using scenarios from the Moral Competence Test (MCT). Numerical responses were collected from ChatGPT 3.5 and 4.0 to assess individual and overall stage scores, including C-index and overall moral stage preference. Descriptive analysis and 2-sided Student’s t-test were used for all continuous data. Results A total of 100 iterations of the MCT were performed and moral preference was found to be higher in the latter Kohlberg-derived arguments. ChatGPT 4.0 was found to have a higher overall moral stage preference (2.325 versus 1.755) when compared to ChatGPT 3.5. ChatGPT 4.0 was also found to have a statistically higher C-index score in comparison to ChatGPT 3.5 (29.03 ± 11.10 versus 19.32 ± 10.95, P =.0000275). Discussion ChatGPT 3.5 and 4.0 trended towards higher moral preference for the latter stages of Kohlberg’s theory for both dilemmas with C-indices suggesting medium moral competence. However, both models showed moderate variation in C-index scores indicating inconsistency and further training is recommended. Conclusion ChatGPT demonstrates medium moral competence and can evaluate arguments based on Kohlberg’s theory of moral development. These findings suggest that future revisions of ChatGPT and other large language models could assist physicians in the decision-making process when encountering complex ethical scenarios. Lay Summary In this cross-sectional study, when provided with health care and non-health care scenarios from the Moral Competence Test, ChatGPT 4.0 outperformed ChatGPT 3.5 in both scenarios in terms of moral competence and overall moral stage preference, suggesting medium moral competence. Both models showed moderate variability despite no changes to the prompts, suggesting inconsistency and the need for in-house training. Future developments of large language models have the potential to assist physicians in the decision-making process when encountering a difficult ethical decision.
ISSN:2574-2531
2574-2531
DOI:10.1093/jamiaopen/ooae065