Benchmarking Pathology Feature Extractors for Whole Slide Image Classification
Weakly supervised whole slide image classification is a key task in computational pathology, which involves predicting a slide-level label from a set of image patches constituting the slide. Constructing models to solve this task involves multiple design choices, often made without robust empirical...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Weakly supervised whole slide image classification is a key task in
computational pathology, which involves predicting a slide-level label from a
set of image patches constituting the slide. Constructing models to solve this
task involves multiple design choices, often made without robust empirical or
conclusive theoretical justification. To address this, we conduct a
comprehensive benchmarking of feature extractors to answer three critical
questions: 1) Is stain normalisation still a necessary preprocessing step? 2)
Which feature extractors are best for downstream slide-level classification? 3)
How does magnification affect downstream performance? Our study constitutes the
most comprehensive evaluation of publicly available pathology feature
extractors to date, involving more than 10,000 training runs across 14 feature
extractors, 9 tasks, 5 datasets, 3 downstream architectures, 2 levels of
magnification, and various preprocessing setups. Our findings challenge
existing assumptions: 1) We observe empirically, and by analysing the latent
space, that skipping stain normalisation and image augmentations does not
degrade performance, while significantly reducing memory and computational
demands. 2) We develop a novel evaluation metric to compare relative downstream
performance, and show that the choice of feature extractor is the most
consequential factor for downstream performance. 3) We find that
lower-magnification slides are sufficient for accurate slide-level
classification. Contrary to previous patch-level benchmarking studies, our
approach emphasises clinical relevance by focusing on slide-level biomarker
prediction tasks in a weakly supervised setting with external validation
cohorts. Our findings stand to streamline digital pathology workflows by
minimising preprocessing needs and informing the selection of feature
extractors. |
---|---|
DOI: | 10.48550/arxiv.2311.11772 |