PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition
Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-f...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recognizing human actions from point cloud sequence has attracted tremendous
attention from both academia and industry due to its wide applications.
However, most previous studies on point cloud action recognition typically
require complex networks to extract intra-frame spatial features and
inter-frame temporal features, resulting in an excessive number of redundant
computations. This leads to high latency, rendering them impractical for
real-world applications. To address this problem, we propose a Plane-Fit
Redundancy Encoding point cloud sequence network named PRENet. The primary
concept of our approach involves the utilization of plane fitting to mitigate
spatial redundancy within the sequence, concurrently encoding the temporal
redundancy of the entire sequence to minimize redundant computations.
Specifically, our network comprises two principal modules: a Plane-Fit
Embedding module and a Spatio-Temporal Consistency Encoding module. The
Plane-Fit Embedding module capitalizes on the observation that successive point
cloud frames exhibit unique geometric features in physical space, allowing for
the reuse of spatially encoded data for temporal stream encoding. The
Spatio-Temporal Consistency Encoding module amalgamates the temporal structure
of the temporally redundant part with its corresponding spatial arrangement,
thereby enhancing recognition accuracy. We have done numerous experiments to
verify the effectiveness of our network. The experimental results demonstrate
that our method achieves almost identical recognition accuracy while being
nearly four times faster than other state-of-the-art methods. |
---|---|
DOI: | 10.48550/arxiv.2405.06929 |