An annotated image database of building facades categorized into land uses for object detection using deep learning: Case study for the city of Vila Velha-ES, Brazil

This article presents a machine learning approach to automatic land use categorization based on a convolutional artificial neural network architecture. It is intended to support the detection and classification of building facades in order to associate each building with its respective land use. Rep...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine vision and applications 2022-09, Vol.33 (5), Article 80
Hauptverfasser:	Bortoloti, Frederico Damasceno, Tavares, Jonivane, Rauber, Thomas Walter, Ciarelli, Patrick Marques, Botelho, Rayane Cardozo Gama
Format:	Artikel
Sprache:	eng
Schlagworte:	Communications Engineering Computer Science Image Processing and Computer Vision Networks Pattern Recognition Short Paper
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This article presents a machine learning approach to automatic land use categorization based on a convolutional artificial neural network architecture. It is intended to support the detection and classification of building facades in order to associate each building with its respective land use. Replacing the time-consuming manual acquisition of images in the field and subsequent interpretation of the data with computer-aided techniques facilitates the creation of useful maps for urban planning. A specific future objective of this study is to monitor the commercial evolution in the city of Vila Velha, Brazil. The initial step is object detection based on a deep network architecture called Faster R-CNN. The model is trained on a collection of street-level photographs of buildings of desired land uses, from a database of annotated images of building facades. Images are extracted from Google Street View scenes. Furthermore, in order to save manual annotation time, a semi-supervised dual pipeline method is proposed that uses a pre-trained predictor model from the Places365 database to learn unannotated images. Several backbones were connected to the Faster R-CNN architecture for comparisons. The experimental results with the VGG backbone show an improvement over published works, with an average accuracy of 86.49%.
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-022-01335-5