Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model
In recent years, artificial intelligence (AI)-driven video generation has gained significant attention. Consequently, there is a growing need for accurate video quality assessment (VQA) metrics to evaluate the perceptual quality of AI-generated content (AIGC) videos and optimize video generation mod...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, artificial intelligence (AI)-driven video generation has
gained significant attention. Consequently, there is a growing need for
accurate video quality assessment (VQA) metrics to evaluate the perceptual
quality of AI-generated content (AIGC) videos and optimize video generation
models. However, assessing the quality of AIGC videos remains a significant
challenge because these videos often exhibit highly complex distortions, such
as unnatural actions and irrational objects. To address this challenge, we
systematically investigate the AIGC-VQA problem, considering both subjective
and objective quality assessment perspectives. For the subjective perspective,
we construct the Large-scale Generated Video Quality assessment (LGVQ) dataset,
consisting of 2,808 AIGC videos generated by 6 video generation models using
468 carefully curated text prompts. We evaluate the perceptual quality of AIGC
videos from three critical dimensions: spatial quality, temporal quality, and
text-video alignment. For the objective perspective, we establish a benchmark
for evaluating existing quality assessment metrics on the LGVQ dataset. Our
findings show that current metrics perform poorly on this dataset, highlighting
a gap in effective evaluation tools. To bridge this gap, we propose the Unify
Generated Video Quality assessment (UGVQ) model, designed to accurately
evaluate the multi-dimensional quality of AIGC videos. The UGVQ model
integrates the visual and motion features of videos with the textual features
of their corresponding prompts, forming a unified quality-aware feature
representation tailored to AIGC videos. Experimental results demonstrate that
UGVQ achieves state-of-the-art performance on the LGVQ dataset across all three
quality dimensions. Both the LGVQ dataset and the UGVQ model are publicly
available on https://github.com/zczhang-sjtu/UGVQ.git. |
---|---|
DOI: | 10.48550/arxiv.2407.21408 |