On Exploring PDE Modeling for Point Cloud Video Representation Learning
Point cloud video representation learning is challenging due to complex structures and unordered spatial arrangement. Traditional methods struggle with frame-to-frame correlations and point-wise correspondence tracking. Recently, partial differential equations (PDE) have provided a new perspective i...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Point cloud video representation learning is challenging due to complex
structures and unordered spatial arrangement. Traditional methods struggle with
frame-to-frame correlations and point-wise correspondence tracking. Recently,
partial differential equations (PDE) have provided a new perspective in
uniformly solving spatial-temporal data information within certain constraints.
While tracking tangible point correspondence remains challenging, we propose to
formalize point cloud video representation learning as a PDE-solving problem.
Inspired by fluid analysis, where PDEs are used to solve the deformation of
spatial shape over time, we employ PDE to solve the variations of spatial
points affected by temporal information. By modeling spatial-temporal
correlations, we aim to regularize spatial variations with temporal features,
thereby enhancing representation learning in point cloud videos. We introduce
Motion PointNet composed of a PointNet-like encoder and a PDE-solving module.
Initially, we construct a lightweight yet effective encoder to model an initial
state of the spatial variations. Subsequently, we develop our PDE-solving
module in a parameterized latent space, tailored to address the spatio-temporal
correlations inherent in point cloud video. The process of solving PDE is
guided and refined by a contrastive learning structure, which is pivotal in
reshaping the feature distribution, thereby optimizing the feature
representation within point cloud video data. Remarkably, our Motion PointNet
achieves an impressive accuracy of 97.52% on the MSRAction-3D dataset,
surpassing the current state-of-the-art in all aspects while consuming minimal
resources (only 0.72M parameters and 0.82G FLOPs). |
---|---|
DOI: | 10.48550/arxiv.2404.04720 |