Can Deep Learning Recognize Subtle Human Activities?
Deep Learning has driven recent and exciting progress in computer vision, instilling the belief that these algorithms could solve any visual task. Yet, datasets commonly used to train and test computer vision algorithms have pervasive confounding factors. Such biases make it difficult to truly estim...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep Learning has driven recent and exciting progress in computer vision,
instilling the belief that these algorithms could solve any visual task. Yet,
datasets commonly used to train and test computer vision algorithms have
pervasive confounding factors. Such biases make it difficult to truly estimate
the performance of those algorithms and how well computer vision models can
extrapolate outside the distribution in which they were trained. In this work,
we propose a new action classification challenge that is performed well by
humans, but poorly by state-of-the-art Deep Learning models. As a
proof-of-principle, we consider three exemplary tasks: drinking, reading, and
sitting. The best accuracies reached using state-of-the-art computer vision
models were 61.7%, 62.8%, and 76.8%, respectively, while human participants
scored above 90% accuracy on the three tasks. We propose a rigorous method to
reduce confounds when creating datasets, and when comparing human versus
computer vision performance. Source code and datasets are publicly available. |
---|---|
DOI: | 10.48550/arxiv.2003.13852 |