Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification
Medical image analysis is a hot research topic because of its usefulness in different clinical applications, such as early disease diagnosis and treatment. Convolutional neural networks (CNNs) have become the de-facto standard in medical image analysis tasks because of their ability to learn complex...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Medical image analysis is a hot research topic because of its usefulness in
different clinical applications, such as early disease diagnosis and treatment.
Convolutional neural networks (CNNs) have become the de-facto standard in
medical image analysis tasks because of their ability to learn complex features
from the available datasets, which makes them surpass humans in many
image-understanding tasks. In addition to CNNs, transformer architectures also
have gained popularity for medical image analysis tasks. However, despite
progress in the field, there are still potential areas for improvement. This
study uses different CNNs and transformer-based methods with a wide range of
data augmentation techniques. We evaluated their performance on three medical
image datasets from different modalities. We evaluated and compared the
performance of the vision transformer model with other state-of-the-art (SOTA)
pre-trained CNN networks. For Chest X-ray, our vision transformer model
achieved the highest F1 score of 0.9532, recall of 0.9533, Matthews correlation
coefficient (MCC) of 0.9259, and ROC-AUC score of 0.97. Similarly, for the
Kvasir dataset, we achieved an F1 score of 0.9436, recall of 0.9437, MCC of
0.9360, and ROC-AUC score of 0.97. For the Kvasir-Capsule (a large-scale VCE
dataset), our ViT model achieved a weighted F1-score of 0.7156, recall of
0.7182, MCC of 0.3705, and ROC-AUC score of 0.57. We found that our
transformer-based models were better or more effective than various CNN models
for classifying different anatomical structures, findings, and abnormalities.
Our model showed improvement over the CNN-based approaches and suggests that it
could be used as a new benchmarking algorithm for algorithm development. |
---|---|
DOI: | 10.48550/arxiv.2304.11529 |