Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors
Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors are described. A 3D convolution operation performed on the 5D input tensor is simulated by performing 2D convolution on the 4D tensor. A convolutional block of the CNN performs two parallel operations: a sp...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Methods, systems, and media for computer vision using 2D convolution of 4D video data tensors are described. A 3D convolution operation performed on the 5D input tensor is simulated by performing 2D convolution on the 4D tensor. A convolutional block of the CNN performs two parallel operations: a spatial processing branch performs spatial feature extraction on a 4D tensor using 2D convolution, and a temporal processing branch performs temporal feature extraction on a different 4D tensor using 2D convolution. The output tensors of the spatial processing branches and the temporal processing branches are combined to generate an output tensor of the convolutional block. The convolution block may include additional operations, such as reshaping and/or further convolution operations, to generate an output tensor of the same size for each branch, thereby eliminating the need to post-process branch output tensors prior to their combination.
描述了使用4D视频数据张量的2D卷积进行计算机视觉的方法、系统和介质。通过对4D张量执行2D卷积来模拟对5D输入张量执行的3D卷积运算。CNN的卷积块执行 |
---|