Predicting Generalization of AI Colonoscopy Models to Unseen Data
$\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}$: We use a "Masked Siamese Network"...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is
important for wider adoption in clinical practice. However, current techniques
for evaluating performance on unseen data require expensive and time-intensive
labels.
$\textbf{Methods}$: We use a "Masked Siamese Network" (MSN) to identify novel
phenomena in unseen data and predict polyp detector performance. MSN is trained
to predict masked out regions of polyp images, without any labels. We test
MSN's ability to be trained on data only from Israel and detect unseen
techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes
from Japan (354 videos, 128 hours). We also test MSN's ability to predict
performance of Computer Aided Detection (CADe) of polyps on colonoscopies from
both countries, even though MSN is not trained on data from Japan.
$\textbf{Results}$: MSN correctly identifies NBI and CE as less similar to
Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p <
10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99%
accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite
being trained only on whitelight, and is the only method that is robust to
noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel
and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few
examples of Japan detector performance to train on, MSN prediction of Japan
performance improves (r=0.56).
$\textbf{Conclusion}$: Our technique can identify distribution shifts in
clinical data and can predict CADe detector performance on unseen data, without
labels. Our self-supervised approach can aid in detecting when data in practice
is different from training, such as between hospitals or data has meaningfully
shifted from training. MSN has potential for application to medical image
domains beyond colonoscopy. |
---|---|
DOI: | 10.48550/arxiv.2403.09920 |