Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-07
Hauptverfasser: Zhang, Zhichao, Li, Xinyue, Sun, Wei, Jia, Jun, Xiongkuo Min, Zhang, Zicheng, Li, Chunyi, Chen, Zijian, Wang, Puyi, Ji, Zhongpeng, Sun, Fengyu, Shangling Jui, Zhai, Guangtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Zhang, Zhichao
Li, Xinyue
Sun, Wei
Jia, Jun
Xiongkuo Min
Zhang, Zicheng
Li, Chunyi
Chen, Zijian
Wang, Puyi
Ji, Zhongpeng
Sun, Fengyu
Shangling Jui
Zhai, Guangtao
description In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3087030809</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3087030809</sourcerecordid><originalsourceid>FETCH-proquest_journals_30870308093</originalsourceid><addsrcrecordid>eNqNirEOgjAUABsTE4nyDy9xJqmtCLohKjo4mKgraexDi1CUVwb_XgY_wOVuuBswT0g5C-K5ECPmE5Wcc7GIRBhKj2VrtLdHrdqnsXdIDlkKV6OxgVOnKuM-kBAhUY3WrSCBjXKK0IGyGi7WFAY1HBuN1YQNC1UR-j-P2XS3Paf74NU27w7J5WXTtbZPueRxxHvwpfzv-gJUDzpE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3087030809</pqid></control><display><type>article</type><title>Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model</title><source>Free E- Journals</source><creator>Zhang, Zhichao ; Li, Xinyue ; Sun, Wei ; Jia, Jun ; Xiongkuo Min ; Zhang, Zicheng ; Li, Chunyi ; Chen, Zijian ; Wang, Puyi ; Ji, Zhongpeng ; Sun, Fengyu ; Shangling Jui ; Zhai, Guangtao</creator><creatorcontrib>Zhang, Zhichao ; Li, Xinyue ; Sun, Wei ; Jia, Jun ; Xiongkuo Min ; Zhang, Zicheng ; Li, Chunyi ; Chen, Zijian ; Wang, Puyi ; Ji, Zhongpeng ; Sun, Fengyu ; Shangling Jui ; Zhai, Guangtao</creatorcontrib><description>In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial intelligence ; Benchmarks ; Datasets ; Large language models ; Quality assessment ; Video ; Visual aspects</subject><ispartof>arXiv.org, 2024-07</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Zhang, Zhichao</creatorcontrib><creatorcontrib>Li, Xinyue</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><creatorcontrib>Jia, Jun</creatorcontrib><creatorcontrib>Xiongkuo Min</creatorcontrib><creatorcontrib>Zhang, Zicheng</creatorcontrib><creatorcontrib>Li, Chunyi</creatorcontrib><creatorcontrib>Chen, Zijian</creatorcontrib><creatorcontrib>Wang, Puyi</creatorcontrib><creatorcontrib>Ji, Zhongpeng</creatorcontrib><creatorcontrib>Sun, Fengyu</creatorcontrib><creatorcontrib>Shangling Jui</creatorcontrib><creatorcontrib>Zhai, Guangtao</creatorcontrib><title>Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model</title><title>arXiv.org</title><description>In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.</description><subject>Artificial intelligence</subject><subject>Benchmarks</subject><subject>Datasets</subject><subject>Large language models</subject><subject>Quality assessment</subject><subject>Video</subject><subject>Visual aspects</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNirEOgjAUABsTE4nyDy9xJqmtCLohKjo4mKgraexDi1CUVwb_XgY_wOVuuBswT0g5C-K5ECPmE5Wcc7GIRBhKj2VrtLdHrdqnsXdIDlkKV6OxgVOnKuM-kBAhUY3WrSCBjXKK0IGyGi7WFAY1HBuN1YQNC1UR-j-P2XS3Paf74NU27w7J5WXTtbZPueRxxHvwpfzv-gJUDzpE</recordid><startdate>20240731</startdate><enddate>20240731</enddate><creator>Zhang, Zhichao</creator><creator>Li, Xinyue</creator><creator>Sun, Wei</creator><creator>Jia, Jun</creator><creator>Xiongkuo Min</creator><creator>Zhang, Zicheng</creator><creator>Li, Chunyi</creator><creator>Chen, Zijian</creator><creator>Wang, Puyi</creator><creator>Ji, Zhongpeng</creator><creator>Sun, Fengyu</creator><creator>Shangling Jui</creator><creator>Zhai, Guangtao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240731</creationdate><title>Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model</title><author>Zhang, Zhichao ; Li, Xinyue ; Sun, Wei ; Jia, Jun ; Xiongkuo Min ; Zhang, Zicheng ; Li, Chunyi ; Chen, Zijian ; Wang, Puyi ; Ji, Zhongpeng ; Sun, Fengyu ; Shangling Jui ; Zhai, Guangtao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30870308093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Benchmarks</topic><topic>Datasets</topic><topic>Large language models</topic><topic>Quality assessment</topic><topic>Video</topic><topic>Visual aspects</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhichao</creatorcontrib><creatorcontrib>Li, Xinyue</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><creatorcontrib>Jia, Jun</creatorcontrib><creatorcontrib>Xiongkuo Min</creatorcontrib><creatorcontrib>Zhang, Zicheng</creatorcontrib><creatorcontrib>Li, Chunyi</creatorcontrib><creatorcontrib>Chen, Zijian</creatorcontrib><creatorcontrib>Wang, Puyi</creatorcontrib><creatorcontrib>Ji, Zhongpeng</creatorcontrib><creatorcontrib>Sun, Fengyu</creatorcontrib><creatorcontrib>Shangling Jui</creatorcontrib><creatorcontrib>Zhai, Guangtao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhichao</au><au>Li, Xinyue</au><au>Sun, Wei</au><au>Jia, Jun</au><au>Xiongkuo Min</au><au>Zhang, Zicheng</au><au>Li, Chunyi</au><au>Chen, Zijian</au><au>Wang, Puyi</au><au>Ji, Zhongpeng</au><au>Sun, Fengyu</au><au>Shangling Jui</au><au>Zhai, Guangtao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model</atitle><jtitle>arXiv.org</jtitle><date>2024-07-31</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_3087030809
source Free E- Journals
subjects Artificial intelligence
Benchmarks
Datasets
Large language models
Quality assessment
Video
Visual aspects
title Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T19%3A25%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Benchmarking%20AIGC%20Video%20Quality%20Assessment:%20A%20Dataset%20and%20Unified%20Model&rft.jtitle=arXiv.org&rft.au=Zhang,%20Zhichao&rft.date=2024-07-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3087030809%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3087030809&rft_id=info:pmid/&rfr_iscdi=true