Attention-based multi-scale feature fusion network for myopia grading using optical coherence tomography images

Myopia is a serious threat to eye health and can even cause blindness. It is important to grade myopia and carry out targeted intervention. Nowadays, various studies using deep learning models based on optical coherence tomography (OCT) images to screen for high myopia. However, since regions of int...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Visual computer 2024-09, Vol.40 (9), p.6627-6638
Hauptverfasser: Huang, Gengyou, Wen, Yang, Qian, Bo, Bi, Lei, Chen, Tingli, Sheng, Bin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Myopia is a serious threat to eye health and can even cause blindness. It is important to grade myopia and carry out targeted intervention. Nowadays, various studies using deep learning models based on optical coherence tomography (OCT) images to screen for high myopia. However, since regions of interest (ROIs) of pre-myopia and low myopia on OCT images are relatively small, it is rather difficult to use OCT images to conduct detailed myopia grading. There are few studies using OCT images for more detailed myopia grading. To address these problems, we propose a novel attention-based multi-scale feature fusion network named AMFF for myopia grading using OCT images. The proposed AMFF mainly consists of five modules: a pre-trained vision transformer (ViT) module, a multi-scale convolutional module, an attention feature fusion module, an Avg-TopK pooling module and a fully connected (FC) classifier. Firstly, unsupervised pre-training of ViT on the training set can better extract feature maps. Secondly, multi-scale convolutional layers further extract multi-scale feature maps to obtain more receptive fields and extract scale-invariant features. Thirdly, feature maps of different scales are fused through channel attention and spatial attention to further obtain more meaningful features. Lastly, the most prominent features are obtained by the weighted average of the highest activation values of each channel, and then they are used to classify myopia through a fully connected layer. Extensive experiments show that our proposed model has the superior performance compared with other state-of-the-art myopia grading models.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-023-03189-y