Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming

Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2021, Vol.30, p.4622-4636
Hauptverfasser:	Zhang, Xue, Cheung, Gene, Zhao, Yao, Le Callet, Patrick, Lin, Chunyu, Tan, Jack Z. G.
Format:	Artikel
Sprache:	eng
Schlagworte:	360 video streaming Biological models (mathematics) Computer Science Data models directed graph learning Directed graphs Field of view Graph theory Head movement head movement prediction Helmet mounted displays High definition Human motion Image Processing Information sources Iterative methods Learning Markov chains Optimization Predictive models Servers Streaming media Video transmission
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2021.3073283