Development and validation of deep learning models for identifying the brand of pedicle screws on plain spine radiographs

Background In spinal revision surgery, previous pedicle screws (PS) may need to be replaced with new implants. Failure to accurately identify the brand of PS‐based instrumentation preoperatively may increase the risk of perioperative complications. This study aimed to develop and validate an optimal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:JOR-spine 2024-09, Vol.7 (3), p.e70001-n/a
Hauptverfasser: Yao, Yu‐Cheng, Lin, Cheng‐Li, Chen, Hung‐Hsun, Lin, Hsi‐Hsien, Hsiung, Wei, Wang, Shih‐Tien, Sun, Ying‐Chou, Tang, Yu‐Hsuan, Chou, Po‐Hsin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background In spinal revision surgery, previous pedicle screws (PS) may need to be replaced with new implants. Failure to accurately identify the brand of PS‐based instrumentation preoperatively may increase the risk of perioperative complications. This study aimed to develop and validate an optimal deep learning (DL) model to identify the brand of PS‐based instrumentation on plain radiographs of spine (PRS) using anteroposterior (AP) and lateral images. Methods A total of 529 patients who received PS‐based instrumentation from seven manufacturers were enrolled in this retrospective study. The postoperative PRS were gathered as ground truths. The training, validation, and testing datasets contained 338, 85, and 106 patients, respectively. YOLOv5 was used to crop out the screws' trajectory, and the EfficientNet‐b0 model was used to develop single models (AP, Lateral, Merge, and Concatenated) based on the different PRS images. The ensemble models were different combinations of the single models. Primary outcomes were the models' performance in accuracy, sensitivity, precision, F1‐score, kappa value, and area under the curve (AUC). Secondary outcomes were the relative performance of models versus human readers and external validation of the DL models. Results The Lateral model had the most stable performance among single models. The discriminative performance was improved by the ensemble method. The AP + Lateral ensemble model had the most stable performance, with an accuracy of 0.9434, F1 score of 0.9388, and AUC of 0.9834. The performance of the ensemble models was comparable to that of experienced orthopedic surgeons and superior to that of inexperienced orthopedic surgeons. External validation revealed that the Lat + Concat ensemble model had the best accuracy (0.9412). Conclusion The DL models demonstrated stable performance in identifying the brand of PS‐based instrumentation based on AP and/or lateral images of PRS, which may assist orthopedic spine surgeons in preoperative revision planning in clinical practice. The AP + Lateral ensemble model had the most stable performance in terms of accuracy, F1 score, and AUC. The performances of the ensemble models were comparable to that of experienced orthopedic spine surgeons. The external validation also confirmed the good performance of DL models reported in this study.
ISSN:2572-1143
2572-1143
DOI:10.1002/jsp2.70001