Medical supervised masked autoencoders: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification
Masked autoencoders (MAEs) have displayed significant potential in the classification and semantic segmentation of medical images in the last year. Due to the high similarity of human tissues, even slight changes in medical images may represent diseased tissues, necessitating fine-grained inspection...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Masked autoencoders (MAEs) have displayed significant potential in the
classification and semantic segmentation of medical images in the last year.
Due to the high similarity of human tissues, even slight changes in medical
images may represent diseased tissues, necessitating fine-grained inspection to
pinpoint diseased tissues. The random masking strategy of MAEs is likely to
result in areas of lesions being overlooked by the model. At the same time,
inconsistencies between the pre-training and fine-tuning phases impede the
performance and efficiency of MAE in medical image classification. To address
these issues, we propose a medical supervised masked autoencoder (MSMAE) in
this paper. In the pre-training phase, MSMAE precisely masks medical images via
the attention maps obtained from supervised training, contributing to the
representation learning of human tissue in the lesion area. During the
fine-tuning phase, MSMAE is also driven by attention to the accurate masking of
medical images. This improves the computational efficiency of the MSMAE while
increasing the difficulty of fine-tuning, which indirectly improves the quality
of MSMAE medical diagnosis. Extensive experiments demonstrate that MSMAE
achieves state-of-the-art performance in case with three official medical
datasets for various diseases. Meanwhile, transfer learning for MSMAE also
demonstrates the great potential of our approach for medical semantic
segmentation tasks. Moreover, the MSMAE accelerates the inference time in the
fine-tuning phase by 11.2% and reduces the number of floating-point operations
(FLOPs) by 74.08% compared to a traditional MAE. |
---|---|
DOI: | 10.48550/arxiv.2305.05871 |