Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors

Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors are described. A 3D convolution operation performed on the 5D input tensor is simulated by performing 2D convolution on the 4D tensor. A convolutional block of the CNN performs two parallel operations: a sp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HABIB HABIB HAJI MOHRAHOSINI, KUMAR KAUSHAL, DENG, GORDON
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors are described. A 3D convolution operation performed on the 5D input tensor is simulated by performing 2D convolution on the 4D tensor. A convolutional block of the CNN performs two parallel operations: a spatial processing branch performs spatial feature extraction on a 4D tensor using 2D convolution, and a temporal processing branch performs temporal feature extraction on a different 4D tensor using 2D convolution. The output tensors of the spatial processing branches and the temporal processing branches are combined to generate an output tensor of the convolutional block. The convolution block may include additional operations, such as reshaping and/or further convolution operations, to generate an output tensor of the same size for each branch, thereby eliminating the need to post-process branch output tensors prior to their combination. 描述了使用4D视频数据张量的2D卷积进行计算机视觉的方法、系统和介质。通过对4D张量执行2D卷积来模拟对5D输入张量执行的3D卷积运算。CNN的卷积块执行