Transform-based architecture for transform coding of media

Systems and techniques for processing media data using a neural network system are described herein. For example, a process may include obtaining a potential representation of an encoded frame of image data; and generating a decoded image data frame by a plurality of decoder converter layers of a de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHU YINHAO, YANG YANG, COHEN, THOMAS, S
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Systems and techniques for processing media data using a neural network system are described herein. For example, a process may include obtaining a potential representation of an encoded frame of image data; and generating a decoded image data frame by a plurality of decoder converter layers of a decoder sub-network using the potential representation of the encoded image data frame as input. At least one decoder converter layer of the plurality of decoder converter layers includes: one or more converter blocks for generating one or more feature patches and locally determining self-attention within one or more window partitions and shift window partitions applied to the one or more patches; and a patch de-consolidation engine to reduce a respective size of each of the one or more patches. 本文描述了用于使用神经网络系统来处理媒体数据的系统和技术。例如,过程可包括:获得经编码图像数据帧的潜在表示;以及通过解码器子网络的多个解码器变换器层使用该经编码图像数据帧的该潜在表示作为输入来生成经解码图像数据帧。该多个解码器变换器层中的至少一个解码器变换器层包括:一个或多个变换器块,该一个或多个变换器块用于生成一个或多个特征补丁并且在应用于该一个或多个补丁上的一个或多个窗口分区和移位窗口分区内局部地确定自注意力;和补丁去合并引擎,该补丁去合并引