Synthesizing Pose Sequences from 3D Assets for Vision-Based Activity Analysis
AbstractIn recent years, computer vision algorithms have shown to effectively leverage visual data from jobsites for video-based activity analysis of construction equipment. However, earthmoving operations are restricted to site work and surrounding terrain, and the presence of other structures, par...
Gespeichert in:
Veröffentlicht in: | Journal of computing in civil engineering 2021-01, Vol.35 (1) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | AbstractIn recent years, computer vision algorithms have shown to effectively leverage visual data from jobsites for video-based activity analysis of construction equipment. However, earthmoving operations are restricted to site work and surrounding terrain, and the presence of other structures, particularly in urban areas, limits the number of viewpoints from which operations can be recorded. These considerations lower the degree of intra-activity and interactivity category variability to which said algorithms are exposed, hindering their potential for generalizing effectively to new jobsites. Secondly, training computer vision algorithms is also typically reliant on large quantities of hand-annotated ground truth. These annotations are burdensome to obtain and can offset the cost-effectiveness incurred from automating activity analysis. The main contribution of this paper is a means of inexpensively generating synthetic data to improve the capabilities of vision-based activity analysis methods based on virtual, kinematically articulated three-dimensional (3D) models of construction equipment. The authors introduce an automated synthetic data generation method that outputs a two-dimensional (2D) pose corresponding to simulated excavator operations that vary according to camera position with respect to the excavator and activity length and behavior. The presented method is validated by training a deep learning–based method on the synthesized 2D pose sequences and testing on pose sequences corresponding to real-world excavator operations, achieving 75% precision and 71% recall. This exceeds the 66% precision and 65% recall obtained when training and testing the deep learning–based method on the real-world data via cross-validation. Limited access to reliable amounts of real-world data incentivizes using synthetically generated data for training vision-based activity analysis algorithms. |
---|---|
ISSN: | 0887-3801 1943-5487 |
DOI: | 10.1061/(ASCE)CP.1943-5487.0000937 |