Quality Feature Learning via Multi-channel CNN and GRU for No-reference Video Quality Assessment

Nowadays, video quality assessment (VQA) plays a vital role in video-related industries to predict human perceived video quality to maintain the quality of service. Although many deep neural network-based VQA methods have been proposed, the robustness and performance are limited by small scale of av...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Kwong, Ngai-Wing, Chan, Yui-Lam, Tsang, Sik-Ho, Lun, Daniel Pak-Kong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Nowadays, video quality assessment (VQA) plays a vital role in video-related industries to predict human perceived video quality to maintain the quality of service. Although many deep neural network-based VQA methods have been proposed, the robustness and performance are limited by small scale of available human-label data. Recently, some transfer learning-based methods and pre-trained models in other domains have been adopted in VQA to compensate for the lack of enormous training samples. However, they result in a domain gap between the source and target domains, which provides sub-optimal feature representation for VQA tasks and deteriorates the accuracy. Therefore, in the paper, we propose quality feature learning via a multi-channel convolutional neural network (CNN) with a gated recurrent unit (GRU), taking into account both the motion-aware information and human visual perception (HVP) characteristics to solve the above issue for no-reference VQA. First, inspired by self-supervised learning (SSL), the multi-channel CNN is pre-trained on the image quality assessment (IQA) domain without using human annotation labels. Then, semi-supervised learning is applied on top of the pre-trained multi-channel CNN to fine-tune the model to transfer the domain from IQA to VQA while considering motion-aware information for better frame-level quality feature representation. After that, several HVP features are extracted with frame-level quality feature representation as the input of the GRU model to obtain the final precise predicted video quality. Finally, the experimental results demonstrate the robustness and validity of our model, which is superior to the state-of-the-art approaches and is closely related to human perception.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3259101