Sum Product Networks for Activity Recognition
This paper addresses detection and localization of human activities in videos. We focus on activities that may have variable spatiotemporal arrangements of parts, and numbers of actors. Such activities are represented by a sum-product network (SPN). A product node in SPN represents a particular arra...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2016-04, Vol.38 (4), p.800-813 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper addresses detection and localization of human activities in videos. We focus on activities that may have variable spatiotemporal arrangements of parts, and numbers of actors. Such activities are represented by a sum-product network (SPN). A product node in SPN represents a particular arrangement of parts, and a sum node represents alternative arrangements. The sums and products are hierarchically organized, and grounded onto space-time windows covering the video. The windows provide evidence about the activity classes based on the Counting Grid (CG) model of visual words. This evidence is propagated bottom-up and top-down to parse the SPN graph for the explanation of the video. The node connectivity and model parameters of SPN and CG are jointly learned under two settings, weakly supervised, and supervised. For evaluation, we use our new Volleyball dataset, along with the benchmark datasets VIRAT, UT-Interactions, KTH, and TRECVID MED 2011. Our video classification and activity localization are superior to those of the state of the art on these datasets. |
---|---|
ISSN: | 0162-8828 1939-3539 2160-9292 |
DOI: | 10.1109/TPAMI.2015.2465955 |