Object Tracking in Satellite Videos Based on Convolutional Regression Network With Appearance and Motion Features

Object tracking is one of the most important components in numerous applications of computer vision. Remote sensing videos provided by commercial satellites make it possible to extend this topic into the earth observation domain. In satellite videos, typical moving targets like vehicles and planes o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of selected topics in applied earth observations and remote sensing 2020, Vol.13, p.783-793
Hauptverfasser: Hu, Zhaopeng, Yang, Daiqin, Zhang, Kao, Chen, Zhenzhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Object tracking is one of the most important components in numerous applications of computer vision. Remote sensing videos provided by commercial satellites make it possible to extend this topic into the earth observation domain. In satellite videos, typical moving targets like vehicles and planes only cover a small area of pixels, and they could easily be confused with surrounding complex ground scenes. Similar objects nearby in satellite videos can hardly be differed by appearance details due to the resolution constraint. Thus, tracking drift caused by distractions is also a thorny problem. Facing challenges, traditional tracking methods such as correlation filters with hand-crafted visual features achieve unsatisfactory results in satellite videos. Methods based on deep neural networks have demonstrated their superiority in various ordinary visual tracking benchmarks, but their results on satellite videos remain unexplored. In this article, deep learning technologies are applied to object tracking in satellite videos for better performance. A simple regression network is used to combine a regression model with convolutional layers and a gradient descent algorithm. The regression network fully exploits the abundant background context to learn a robust tracker. Instead of handcrafted features, both appearance features and motion features, which are extracted by pretrained deep neural networks, are used for accurate object tracking. In cases when the tracker encounters ambiguous appearance information, the motion features could provide complementary and discriminative information to improve tracking performances. Experimental results on various satellite videos show that the proposed method achieves better tracking performance than other state-of-the-arts.
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2020.2971657