Evaluation of GPT-4 Concordance with North American Spine Society Guidelines for Lumbar Fusion Surgery

Concordance with evidence-based medicine (EBM) guidelines is associated with improved clinical outcomes in spine surgery. The North American Spine Society (NASS) has published coverage guidelines on indications for lumbar fusion surgery, with a recent survey demonstrating a 60% concordance rate acro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	North American Spine Society journal (NASSJ) 2024-12, p.100580, Article 100580
Hauptverfasser:	Khoylyan, Ara, Salvato, Jason, Vazquez, Frank, Girgis, Mina, Tang, Alex, Chen, Tan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence ChatGPT degenerative disc disease large language models lumbar fusion surgery NASS guidelines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Concordance with evidence-based medicine (EBM) guidelines is associated with improved clinical outcomes in spine surgery. The North American Spine Society (NASS) has published coverage guidelines on indications for lumbar fusion surgery, with a recent survey demonstrating a 60% concordance rate across its members. GPT-4 is a popular deep learning model that receives knowledge training across public databases including those containing EBM guidelines. There is prior research exploring the potential utility of artificial intelligence (AI) software in adherence with spine surgery practices and guidelines, inviting opportunity to further investigate application in the setting of lumbar fusion surgery with current AI models. Seventeen well-validated clinical vignettes with specific indications for or against lumbar fusion based on NASS criteria were obtained from a prior published research study. Each case was transcribed into a standardized prompt and entered into GPT-4 to obtain a decision whether fusion is indicated. Inter-query reliability was assessed with serial identical queries utilizing the Fleiss’ Kappa statistic. Majority response among serial queries was considered as the final GPT-4 decision. Queries were all entered in separate strings. The investigator entering the prompts was blinded to the NASS-concordant decisions for the cases prior to complete data collection. Decisions by GPT-4 and NASS guidelines were compared with Chi-square analysis. GPT-4 responses for 15/17 (88.2%) of the clinical vignettes were in concordance with NASS EBM lumbar fusion guidelines. There was a significant association in clinical decision-making when determining indication for spine fusion surgery between GPT-4 and NASS guidelines (χ² = 9.75; p < 0.01). There was substantial agreement among the sets of responses generated by GPT-4 for each clinical case (K = 0.71; p < .001). There is significant concordance between GPT-4 responses and NASS EBM indications for lumbar fusion surgery. AI and deep learning models may prove to be an effective adjunct tool for clinical decision-making within modern spine surgery practices.
ISSN:	2666-5484 2666-5484
DOI:	10.1016/j.xnsj.2024.100580