Scene Categorization Model Using Deep Visually Sensitive Features

Visually sensitive regions in the scene are thought to be important for scene categorization. In this paper, we propose to utilize the important visually sensitive information represented by deep features for scene categorization. Specifically, the context relationship between the objects and the su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2019, Vol.7, p.45230-45239
Hauptverfasser:	Shi, Jing, Zhu, Hong, Yu, Shunyuan, Wu, Wenhuan, Shi, Hua
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Algorithms Artificial neural networks categorization model Classification Context Convolution Datasets deep convolution networks Feature extraction Image enhancement Image representation Salience Saliency detection Scene categorization Semantics Visualization visually sensitive features
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Visually sensitive regions in the scene are thought to be important for scene categorization. In this paper, we propose to utilize the important visually sensitive information represented by deep features for scene categorization. Specifically, the context relationship between the objects and the surroundings is fully utilized as the main basis for judging the content of the scene, and combining with the deep convolution neural networks (CNNs), a scene categorization model based on deep visually sensitive features is constructed. First, the saliency regions of the scene images are marked according to the context-based saliency detection algorithm. Then, the original images and the corresponding visually sensitive region detection images are superimposed to obtain the visually sensitive region enhancement images. Second, the deep convolution features of the original images, the visually sensitive region detection images, and the visually sensitive region enhancement images are extracted through the deep CNNs pre-trained on the large-scale scene dataset Places. Finally, considering that the deep features extracted by different layers of the convolution network have different capabilities of discrimination, the fusion features are generated from multiple convolution layers to construct visually sensitive CNN model (VS-CNN). In order to verify the effectiveness of the proposed model, the experiments are conducted on the five standard scene datasets, i.e., LabelMe, UIUC-Sports, Scene-15, MIT67, and SUN. The experimental results show that the proposed model is effective and has good adaptability. Especially, our categorization performance is superior to many state-of the-art methods for a complex indoor scene.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2908448