Ensemble of ghost convolution block with nested transformer encoder for dense object recognition

•We design an efficient deep learning model to detect dense objects.•We introduced GCB based nested transformer in a feature refinement network.•GCB reduces the complexity while the nested transformer extracts feature based on MHSA.•Experiment evaluations are done on VOC, SKU-110K, and GWHD datasets...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Biomedical signal processing and control 2024-02, Vol.88, p.105645, Article 105645
Hauptverfasser:	Vasanthi, Ponduri, Mohan, Laavanya
Format:	Artikel
Sprache:	eng
Schlagworte:	Dense object detection Ghost convolution Multi head self attention Nested-transformer encoder YOLOv5
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We design an efficient deep learning model to detect dense objects.•We introduced GCB based nested transformer in a feature refinement network.•GCB reduces the complexity while the nested transformer extracts feature based on MHSA.•Experiment evaluations are done on VOC, SKU-110K, and GWHD datasets.•We compared our detection results with state-of-the-art models. The Technological advancement and innovation is happening at rapid pace and within the ambit of computerized era, the recognition models have displayed exceptionally outstanding performance in object detection but still the technology relies on ‘Dense-packed object detection”. This object detection remains one of the greatest challenges of the present situation due to redundant feature map calculation complexity, diverse shapes, and alignment of the object in various directions. This paper suggests a GCB (Ghost Convolution Block) based on nested-transformer encoder block in the feature refinement network to overcome these intricacies. The GCB effectively alleviates the redundant feature maps calculation complexity by using DW (Depth-Wise separable) convolution operation. Whereas, the nested-transformer encoder block extracts in-depth information from diverse shaped objects and misaligned objects based on the MHSA (Multi-Head-Self-Attention) mechanism's query, key, and value parameter information. We propose to perform quantitative evaluations on the VOC, GWHD (Global Wheat Head Detection), and SKU-110K data sets and carried out an ablation study by using the YOLOv5 model with GCB and GCBTR (Ghost Convolution Block-based Transformer) modules. We compared this model with conventional other models, our model achieved 84.2% mAP, 80% precision, 78.1% recall, and 79% F1-score on VOC, 81.8% mAP, 91.6% precision, 73.1% recall, and 81.3% F1-score on SKU-110K and also achieved 95.7% mAP, 94.4% precision, 90.2% recall on GWHD datasets. Thus, obtained reliable results prove that the suggested model has shown superior performance when compared with other existing models as well as the YOLOv5 model in detecting dense objects.
ISSN:	1746-8094
DOI:	10.1016/j.bspc.2023.105645