No-Reference Image Quality Assessment Combining Swin-Transformer and Natural Scene Statistics

No-reference image quality assessment aims to evaluate image quality based on human subjective perceptions. Current methods face challenges with insufficient ability to focus on global and local information simultaneously and information loss due to image resizing. To address these issues, we propos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sensors (Basel, Switzerland) Switzerland), 2024-08, Vol.24 (16), p.5221
Hauptverfasser: Yang, Yuxuan, Lei, Zhichun, Li, Changlu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:No-reference image quality assessment aims to evaluate image quality based on human subjective perceptions. Current methods face challenges with insufficient ability to focus on global and local information simultaneously and information loss due to image resizing. To address these issues, we propose a model that combines Swin-Transformer and natural scene statistics. The model utilizes Swin-Transformer to extract multi-scale features and incorporates a feature enhancement module and deformable convolution to improve feature representation, adapting better to structural variations in images, apply dual-branch attention to focus on key areas, and align the assessment more closely with human visual perception. The Natural Scene Statistics compensates information loss caused by image resizing. Additionally, we use a normalized loss function to accelerate model convergence and enhance stability. We evaluate our model on six standard image quality assessment datasets (both synthetic and authentic), and show that our model achieves advanced results across multiple datasets. Compared to the advanced DACNN method, our model achieved Spearman rank correlation coefficients of 0.922 and 0.923 on the KADID and KonIQ datasets, respectively, representing improvements of 1.9% and 2.4% over this method. It demonstrated outstanding performance in handling both synthetic and authentic scenes.
ISSN:1424-8220
1424-8220
DOI:10.3390/s24165221