Stacked convolutional auto-encoder representations with spatial attention for efficient diabetic retinopathy diagnosis

Recently, the attention mechanism has been effectively implemented in convolutional neural networks to boost performance of several computer vision tasks. Recognizing the potential of the attention mechanism in medical imaging, we present an end-to-end-trainable spatial Attention based convolutional...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2022-09, Vol.81 (22), p.32033-32056
1. Verfasser:	Bodapati, Jyostna Devi
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Artificial neural networks Coders Computer architecture Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Datasets Diabetes Diabetic retinopathy Medical imaging Multimedia Information Systems Neural networks Object recognition Representations Special Purpose and Application-Based Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, the attention mechanism has been effectively implemented in convolutional neural networks to boost performance of several computer vision tasks. Recognizing the potential of the attention mechanism in medical imaging, we present an end-to-end-trainable spatial Attention based convolutional neural network architecture for recognizing diabetic retinopathy severity level. Initially spatial representations of the fundus images are projected to reduced space using a stacked convolutional Auto-Encoder. In order to enhance discrimination in reduced space, the auto-encoder is jointly trained with the classifier in an end-to-end manner. Attention mechanism introduced in the classification module ensures high emphasis on lesion regions compared to the non-lesion regions. The proposed model is evaluated on two benchmark datasets, and the experimental outcomes indicate that joint training favors stability and complements the learned representations when used along with attention. The proposed approach outperforms several existing models by achieving an accuracy of 84.17%, 63.24% respectively on Kaggle APTOS19 and IDRiD datasets. In addition, ablation studies validate our contributions and the behavior of the proposed model on both the datasets.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-022-12811-5