Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment
Image quality assessment (IQA) serves as the golden standard for all models' performance in nearly all computer vision fields. However, it still suffers from poor out-of-distribution generalization ability and expensive training costs. To address these problems, we propose Dog-IQA, a standard-g...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image quality assessment (IQA) serves as the golden standard for all models'
performance in nearly all computer vision fields. However, it still suffers
from poor out-of-distribution generalization ability and expensive training
costs. To address these problems, we propose Dog-IQA, a standard-guided
zero-shot mix-grained IQA method, which is training-free and utilizes the
exceptional prior knowledge of multimodal large language models (MLLMs). To
obtain accurate IQA scores, namely scores consistent with humans, we design an
MLLM-based inference pipeline that imitates human experts. In detail, Dog-IQA
applies two techniques. First, Dog-IQA objectively scores with specific
standards that utilize MLLM's behavior pattern and minimize the influence of
subjective factors. Second, Dog-IQA comprehensively takes local semantic
objects and the whole image as input and aggregates their scores, leveraging
local and global information. Our proposed Dog-IQA achieves state-of-the-art
(SOTA) performance compared with training-free methods, and competitive
performance compared with training-based methods in cross-dataset scenarios.
Our code will be available at https://github.com/Kai-Liu001/Dog-IQA. |
---|---|
DOI: | 10.48550/arxiv.2410.02505 |