Quality assessment of anatomical MRI images from generative adversarial networks: Human assessment and image quality metrics
Generative Adversarial Networks (GANs) can synthesize brain images from image or noise input. So far, the gold standard for assessing the quality of the generated images has been human expert ratings. However, due to limitations of human assessment in terms of cost, scalability, and the limited sens...
Gespeichert in:
Veröffentlicht in: | Journal of neuroscience methods 2022-05, Vol.374, p.109579-109579, Article 109579 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generative Adversarial Networks (GANs) can synthesize brain images from image or noise input. So far, the gold standard for assessing the quality of the generated images has been human expert ratings. However, due to limitations of human assessment in terms of cost, scalability, and the limited sensitivity of the human eye to more subtle statistical relationships, a more automated approach towards evaluating GANs is required.
We investigated to what extent visual quality can be assessed using image quality metrics and we used group analysis and spatial independent components analysis to verify that the GAN reproduces multivariate statistical relationships found in real data. Reference human data was obtained by recruiting neuroimaging experts to assess real Magnetic Resonance (MR) images and images generated by a GAN. Image quality was manipulated by exporting images at different stages of GAN training.
Experts were sensitive to changes in image quality as evidenced by ratings and reaction times, and the generated images reproduced group effects (age, gender) and spatial correlations moderately well. We also surveyed a number of image quality metrics. Overall, Fréchet Inception Distance (FID), Maximum Mean Discrepancy (MMD) and Naturalness Image Quality Evaluator (NIQE) showed sensitivity to image quality and good correspondence with the human data, especially for lower-quality images (i.e., images from early stages of GAN training). However, only a Deep Quality Assessment (QA) model trained on human ratings was able to reproduce the subtle differences between higher-quality images.
We recommend a combination of group analyses, spatial correlation analyses, and both distortion metrics (FID, MMD, NIQE) and perceptual models (Deep QA) for a comprehensive evaluation and comparison of brain images produced by GANs.
•Generative modeling of gray-matter density maps of Magnetic Resonance Images (MRI).•Generative Adversarial Network (GAN) creates 3D maps.•Detection task and subjective rating task with neuroimaging experts.•Survey of image quality metrics Inception Score, MIS, FID, MMD, NIQE, BRISQUE.•Deep Quality Assessment model replicating perceptual quality ratings. |
---|---|
ISSN: | 0165-0270 1872-678X |
DOI: | 10.1016/j.jneumeth.2022.109579 |