VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial int...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present VisionFM, a foundation model pre-trained with 3.4 million
ophthalmic images from 560,457 individuals, covering a broad range of
ophthalmic diseases, modalities, imaging devices, and demography. After
pre-training, VisionFM provides a foundation to foster multiple ophthalmic
artificial intelligence (AI) applications, such as disease screening and
diagnosis, disease prognosis, subclassification of disease phenotype, and
systemic biomarker and disease prediction, with each application enhanced with
expert-level intelligence and accuracy. The generalist intelligence of VisionFM
outperformed ophthalmologists with basic and intermediate levels in jointly
diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale
ophthalmic disease diagnosis benchmark database, as well as a new large-scale
segmentation and detection benchmark database, VisionFM outperformed strong
baseline deep neural networks. The ophthalmic image representations learned by
VisionFM exhibited noteworthy explainability, and demonstrated strong
generalizability to new ophthalmic modalities, disease spectrum, and imaging
devices. As a foundation model, VisionFM has a large capacity to learn from
diverse ophthalmic imaging data and disparate datasets. To be commensurate with
this capacity, in addition to the real data used for pre-training, we also
generated and leveraged synthetic ophthalmic imaging data. Experimental results
revealed that synthetic data that passed visual Turing tests, can also enhance
the representation learning capability of VisionFM, leading to substantial
performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI
applications developed, validated, and demonstrated in this work, substantial
further applications can be achieved in an efficient and cost-effective manner
using VisionFM as the foundation. |
---|---|
DOI: | 10.48550/arxiv.2310.04992 |