ATTIQA: Generalizable Image Quality Feature Extractor using Attribute-aware Pretraining
In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision lang...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In no-reference image quality assessment (NR-IQA), the challenge of limited
dataset sizes hampers the development of robust and generalizable models.
Conventional methods address this issue by utilizing large datasets to extract
rich representations for IQA. Also, some approaches propose vision language
models (VLM) based IQA, but the domain gap between generic VLM and IQA
constrains their scalability. In this work, we propose a novel pretraining
framework that constructs a generalizable representation for IQA by selectively
extracting quality-related knowledge from VLM and leveraging the scalability of
large datasets. Specifically, we select optimal text prompts for five
representative image quality attributes and use VLM to generate pseudo-labels.
Numerous attribute-aware pseudo-labels can be generated with large image
datasets, allowing our IQA model to learn rich representations about image
quality. Our approach achieves state-of-the-art performance on multiple IQA
datasets and exhibits remarkable generalization capabilities. Leveraging these
strengths, we propose several applications, such as evaluating image generation
models and training image enhancement models, demonstrating our model's
real-world applicability. |
---|---|
DOI: | 10.48550/arxiv.2406.01020 |