Unbiased evaluation of predicted gamma passing rate by an event‐mixing technique

Background Predicting models of the gamma passing rate (GPR) have been studied to substitute the measurement‐based gamma analysis. Since these studies used data from different radiotherapy systems comprising TPS, linear accelerator, and detector array, it has been difficult to compare the performanc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medical physics (Lancaster) 2024-01, Vol.51 (1), p.5-17
Hauptverfasser: Koganezawa, Akito S, Matsuura, Takaaki, Kawahara, Daisuke, Nakashima, Takeo, Shiba, Eiji, Murakami, Yuji, Nagata, Yasushi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background Predicting models of the gamma passing rate (GPR) have been studied to substitute the measurement‐based gamma analysis. Since these studies used data from different radiotherapy systems comprising TPS, linear accelerator, and detector array, it has been difficult to compare the performances of the predicting models among institutions with different radiotherapy systems. Purpose We aimed to develop unbiased scoring methods to evaluate the performance of the models predicting the GPR, by introducing both best and worst limits for the performance of the GPR prediction. Methods Two hundred head‐and‐neck VMAT plans were used to develop a framework. The GPRs were measured using the ArcCHECK device. The predicted GPR [p] was generated using a deep learning‐based model [pDL]. The predicting model was evaluated using four metrics: standard deviation (SD) [σ], Pearson's correlation coefficient (CC) [r], mean squared error (MSE) [s], and mean absolute error (MAE) [a]. The best limit [σm${\sigma _m}$, rm${r_m}$, sm${s_m}$, and am${a_m}$] was estimated by measuring the SD of measured GPR [m] by shifting the device along the longitudinal direction to measure different sampling points. Mimicked best and worst p’s [pbest and pworst] were generated from pDL. The worst limit was defined such that m and p have no correlation [CC ∼ 0]. The worst limit [σMix, rMix, sMix, and aMix] was generated using the event‐mixing (EM) technique originally introduced in high‐energy physics experiments. The range of σ, r, s, and a was defined to be [σm,σMix]$[ {{\sigma _m},{\sigma _{{\mathrm{Mix}}} ]$, [0,rm]$[ {0,{r_m}} ]$, [sm,sMix]$[ {{s_m},{s_{{\mathrm{Mix}}} ]$, and [am,aMix]$[ {{a_m},{a_{{\mathrm{Mix}}} ]$. The achievement score (AS) independently based on σ, r, s, and a were calculated for pDL, pbest and pworst. The probability that p fails the gamma analysis (alert frequency; AF) was estimated as a function of σd${\sigma _d}$ values within the [σm${\sigma _m}$, σMix] range for the 3%/2 mm data with a 95% criterion. Results SDs of the best limit were well reproduced by σm=0.531100−m${\sigma _m} = \;0.531\sqrt {100 - m} $. The EM technique successfully generated the (m,p)$( {m,p} )$ pairs with no correlation. The AS using four metrics showed good agreement. This agreement indicates successful definitions of both best and worst limits, consistent definitions of the AS, and successful generations of mixed events. The AF for the DL‐based model with the 3%/2 mm tolerance was 31.
ISSN:0094-2405
2473-4209
DOI:10.1002/mp.16848