Urban Street Scene Instance Segmentation: An Integrated Hybrid Network Merging Top-Down and Bottom-Up Strategies

There are two standard methods in instance segmentation: top-down and bottom-up. The top-down approach performs object detection to generate candidate proposals and then performs pixel-level segmentation for each proposal. It is accurate and flexible, capable of handling objects of different sizes a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering letters 2024-05, Vol.32 (5), p.1043
Hauptverfasser: Zhou, Ruifa, Zhao, Ji
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There are two standard methods in instance segmentation: top-down and bottom-up. The top-down approach performs object detection to generate candidate proposals and then performs pixel-level segmentation for each proposal. It is accurate and flexible, capable of handling objects of different sizes and shapes. However, it is computationally complex and relies on object detection accuracy. The bottom-up approach first performs pixel-level clustering or segmentation and then combines candidate instances to obtain the final segmentation result. It can handle overlapping cases and has lower computational complexity, but it may need to localize accurately, and segment instances, and the segmentation granularity is coarser. In this paper, the Urban Street Scene Instance Segmentation (UISNet) algorithm is proposed. Firstly, the feature extraction network is the foundation of UISNet, which uses EfficientNet as the backbone network. Secondly, MPAFPN is the feature pyramid network part of UISNet, used for multi-scale feature fusion. By using EfficientNet and MPAFPN as the backbone network and bottleneck layers, the accuracy of UISNet is improved by 4% compared to ResNet and FPN. In the inference phase, this paper introduces an innovative dual-branch design that combines top-down and bottom-up strategies. One branch is the bounding box aggregation branch, which generates high-dimensional information such as the shape and orientation of bounding boxes based on the FCOS Head. The other branch is the mask decoding branch, which creates mask prediction results. These two branches are fused using the Mask FCN Header to obtain the final instance segmentation result. With this dual-branch design, the model can effectively utilize the information from both top-down and bottom-up approaches, thereby improving the accuracy and robustness of instance segmentation. Through experimental comparisons, the proposed network model in this paper achieves the best performance in terms of accuracy compared to other instance segmentation networks, with an accuracy of 36.28%. Moreover, the proposed model performs better in urban street scenes, enhancing object detection and segmentation and offering more reliable and efficient solutions for applications such as autonomous driving and intelligent transportation.
ISSN:1816-093X
1816-0948