From modern CNNs to vision transformers: Assessing the performance, robustness, and classification strategies of deep learning models in histopathology

While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodol...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medical image analysis 2023-07, Vol.87, p.102809-102809, Article 102809
Hauptverfasser: Springenberg, Maximilian, Frommholz, Annika, Wenzel, Markus, Weicken, Eva, Ma, Jackie, Strodthoff, Nils
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:While machine learning is currently transforming the field of histopathology, the domain lacks a comprehensive evaluation of state-of-the-art models based on essential but complementary quality requirements beyond a mere classification accuracy. In order to fill this gap, we developed a new methodology to extensively evaluate a wide range of classification models, including recent vision transformers, and convolutional neural networks such as: ConvNeXt, ResNet (BiT), Inception, ViT and Swin transformer, with and without supervised or self-supervised pretraining. We thoroughly tested the models on five widely used histopathology datasets containing whole slide images of breast, gastric, and colorectal cancer and developed a novel approach using an image-to-image translation model to assess the robustness of a cancer classification model against stain variations. Further, we extended existing interpretability methods to previously unstudied models and systematically reveal insights of the models’ classification strategies that allow for plausibility checks and systematic comparisons. The study resulted in specific model recommendations for practitioners as well as putting forward a general methodology to quantify a model’s quality according to complementary requirements that can be transferred to future model architectures. [Display omitted] •We propose a thorough methodology to compare performances of different models.•We quantify the focus of cancer classification models on tissue segments.•We developed a novel methodology to assess robustness against staining variations.•Featured models focus on similar tissue segments & are not sufficiently robust.•Through thorough evaluation, ConvNeXt-L established itself as the best model.
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2023.102809