Context-PIPs: Persistent Independent Particles Demands Spatial Context Features
We tackle the problem of Persistent Independent Particles (PIPs), also called Tracking Any Point (TAP), in videos, which specifically aims at estimating persistent long-term trajectories of query points in videos. Previous methods attempted to estimate these trajectories independently to incorporate...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We tackle the problem of Persistent Independent Particles (PIPs), also called
Tracking Any Point (TAP), in videos, which specifically aims at estimating
persistent long-term trajectories of query points in videos. Previous methods
attempted to estimate these trajectories independently to incorporate longer
image sequences, therefore, ignoring the potential benefits of incorporating
spatial context features. We argue that independent video point tracking also
demands spatial context features. To this end, we propose a novel framework
Context-PIPs, which effectively improves point trajectory accuracy by
aggregating spatial context features in videos. Context-PIPs contains two main
modules: 1) a SOurse Feature Enhancement (SOFE) module, and 2) a TArget Feature
Aggregation (TAFA) module. Context-PIPs significantly improves PIPs all-sided,
reducing 11.4% Average Trajectory Error of Occluded Points (ATE-Occ) on CroHD
and increasing 11.8% Average Percentage of Correct Keypoint (A-PCK) on
TAP-Vid-Kinectics. Demos are available at
https://wkbian.github.io/Projects/Context-PIPs/. |
---|---|
DOI: | 10.48550/arxiv.2306.02000 |