Fast Deep Neural Networks With Knowledge Guided Training and Predicted Regions of Interests for Real-Time Video Object Detection
It has been recognized that deeper and wider neural networks are continuously advancing the state-of-the-art performance of various computer vision and machine learning tasks. However, they often require large sets of labeled data for effective training and suffer from extremely high computational c...
Gespeichert in:
Veröffentlicht in: | IEEE access 2018-01, Vol.6, p.8990-8999 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It has been recognized that deeper and wider neural networks are continuously advancing the state-of-the-art performance of various computer vision and machine learning tasks. However, they often require large sets of labeled data for effective training and suffer from extremely high computational complexity, preventing them from being deployed in real-time systems, for example vehicle object detection from vehicle cameras for assisted driving. In this paper, we aim to develop a fast deep neural network for real-time video object detection by exploring the ideas of knowledge-guided training and predicted regions of interest. Specifically, we will develop a new framework for training deep neural networks on datasets with limited labeled samples using cross-network knowledge projection which is able to improve the network performance while reducing the overall computational complexity significantly. A large pre-trained teacher network is used to observe samples from the training data. A projection matrix is learned to project this teacher-level knowledge and its visual representations from an intermediate layer of the teacher network to an intermediate layer of a thinner and faster student network to guide and regulate the training process. To further speed up the network, we propose to train a low-complexity object detection using traditional machine learning methods, such as support vector machine. Using this low-complexity object detector, we identify the regions of interest that contain the target objects with high confidence. We obtain a mathematical formula to estimate the regions of interest to save the computation for each convolution layer. Our experimental results on vehicle detection from videos demonstrated that the proposed method is able to speed up the network by up to 16 times while maintaining the object detection performance. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2018.2795798 |