Parking Space Status Inference Upon a Deep CNN and Multi-Task Contrastive Network With Spatial Transform
Deep learning methods, especially CNNs, have achieved many promising results in a wide range of computer vision applications. However, few studies focused on designing suitable deep learning methods for parking space status inference. As we have known, it is challenging to detect parking spaces in a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2019-04, Vol.29 (4), p.1194-1208 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning methods, especially CNNs, have achieved many promising results in a wide range of computer vision applications. However, few studies focused on designing suitable deep learning methods for parking space status inference. As we have known, it is challenging to detect parking spaces in an outdoor environment due to dynamic lighting variations, weather changes, and perspective distortion. By off-the-shelf CNNs, lighting variations might be handled well. However, to realize a practical and robust inference system, we also need to address troublesome problems, such as parking displacements, non-unified car sizes, inter-object occlusion, and perspective distortion. These problems may become even challenging if also considering the difference of space sizes. To overcome the problems, we proposed a custom-tailored deep convolutional and contrastive network with three contributions. First, we introduced a Siamese architecture to learn the contrastive and robust feature descriptor. This helps to reduce the effects owing to the variety of inter-object occlusion. Second, we integrated a convolutional Spatial Transformer Network (STN) to adaptively transform a 3-space input patch according to vehicle sizes and parking displacement. STN also helps to overcome the perspective distortion problem. Third, a multi-task loss function was designed to train the network by simultaneously considering the accuracy of inferring the status of the target space and the semantic smoothness of high-level features. Thereby, the errors caused by inter-object occlusion could be alleviated. To verify the proposed network, we visualized the learned features and analyzed their functionality. Experiments and evaluations have shown the robustness of our system in parking status inference. The real-time system currently running in public parking lots also demonstrates the effectiveness of the proposed deep network. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2018.2826053 |