A multi-scale no-reference video quality assessment method based on transformer

Video quality assessment is essential for optimizing user experience, enhancing network efficiency, supporting video production and editing, improving advertising effectiveness, and strengthening security in monitoring and other domains. Reacting to the prevailing focus of current research on video...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2024-08, Vol.30 (4), Article 201
Hauptverfasser: Cui, Yingan, Yu, Zonghua, Feng, Yuqin, Wang, Huaijun, Li, Junhuai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Video quality assessment is essential for optimizing user experience, enhancing network efficiency, supporting video production and editing, improving advertising effectiveness, and strengthening security in monitoring and other domains. Reacting to the prevailing focus of current research on video detail distortion while overlooking the temporal relationships between video frames and the impact of content-dependent characteristics of the human visual system on video quality, this paper proposes a multi-scale no-reference video quality assessment method based on transformer. On the one hand, spatial features of the video are extracted using a network that combines swin-transformer and deformable convolution, and further information preservation is achieved through mixed pooling of features in video frames. On the other hand, a pyramid aggregation module is utilized to merge long-term and short-term memories, enhancing the ability to capture temporal changes. Experimental results on public datasets such as KoNViD-1k, CVD2014, and LIVE-VQC demonstrate the effectiveness of the proposed method in video quality prediction.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-024-01403-y