A novel Deeplabv3+ and vision-based transformer model for segmentation and classification of skin lesions
•For more accurate segmentation, after analysis DeepLabv3+ model is trained on selected hyperparameters such as 32 batch-size, 32 filters, 3 channels, 8 classes, 100 epochs, and Adam optimizer.•The vision-based transformer (ViT) model is proposed for SL classification because the transformer process...
Gespeichert in:
Veröffentlicht in: | Biomedical signal processing and control 2024-06, Vol.92, p.106084, Article 106084 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •For more accurate segmentation, after analysis DeepLabv3+ model is trained on selected hyperparameters such as 32 batch-size, 32 filters, 3 channels, 8 classes, 100 epochs, and Adam optimizer.•The vision-based transformer (ViT) model is proposed for SL classification because the transformer processes the information among the distinct locations of the image in the first layer due to self-attention. The model is trained from scratch using selected hyperparameters such as 0.0001 wt decay, 7 patch size, 0.001 learning-rate, 100 epochs, 64 projection dimensions, 4 head, 8 transformation layers, [56,28] Mlp-head-units.
Skin cancer (SC) is a common disease caused due to ultraviolet radiation. Accurate SC detection is degraded due to some artifacts such as lesion variations in shape, size, color, texture, hairs, poor contrast, brightness, and irregular lesion boundaries. To solve these limitations, a deep learning-based technique is proposed that consists of segmentation and classification of SC. The DeepLabv3+ segmentation model is designed that consist of 9 convolutional neural network blocks. Each block comprises 19 convolution, 18 rectified linear units, and 18 batch normalization layers. The model is evaluated on ISIC-16, 17, 18, and PH2 datasets that provide accuracy of 98.90 %, 98.38 %, 99.45 %, and 100 %, respectively. Another Vision Transformer (ViT) model is developed for the classification of skin lesions (SL). The ViT model performs better than CNN because ViT works as a token while CNN works pixel to pixel. The ViT model consists of eight blocks, each with 17 normalization, 8 multi-head attention, 19 dense, and 19 dropout layers with a 7x7 patch size. The model is evaluated on PH2, ISIC-19, ISIC-20, and HAM10000 datasets that provided an accuracy of 100 %, 96.97 %, 97.73 %, and 100 % respectively. The results are better than existing methods. |
---|---|
ISSN: | 1746-8094 1746-8108 |
DOI: | 10.1016/j.bspc.2024.106084 |