AENet: attention enhancement network for industrial defect detection in complex and sensitive scenarios

Conventional image processing and machine learning based on handcrafted features struggle to meet the real time and high-accuracy requirements for industrial defect detection in complex, sensitive, and dynamic environments. To address this issue, this paper proposes AENet, a novel real-time defect d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024, Vol.80 (9), p.11845-11868
Hauptverfasser:	Wan, Yi, Yi, Lingjie, Jiang, Bo, Chen, Junfan, Jiang, Yi, Xie, Xianzhong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Compilers Computational efficiency Computer Science Convergence Decoding Defects Encoders-Decoders Image processing Interpreters Machine learning Modules Processor Architectures Programming Languages Real time Recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Conventional image processing and machine learning based on handcrafted features struggle to meet the real time and high-accuracy requirements for industrial defect detection in complex, sensitive, and dynamic environments. To address this issue, this paper proposes AENet, a novel real-time defect detection network based on an encoder-decoder model, which achieves high detection accuracy and efficiency while demonstrating excellent convergence and generalization. Firstly, a spatial channel attention module in the encoding network is designed to exploit both spatial attention and channel attention using a multi-head 3D self-attention mechanism. This improves parallelism and detection efficiency. Secondly, the decoding network of AENet incorporates the cross-level attention fusion module, which fuses input features from different layers. Combined with multi-level upsampling design, the decoder enhances the representation of defect details. Furthermore, we insert a simplified aggregator into the encoder-decoder network to extract feature information at different scales with low computational cost. This aggregation process aids in training and inference on industrial defect datasets by incorporating contextual information. Extensive experimental results demonstrate that AENet outperforms other segmentation models in accomplishing defect recognition and segmentation in challenging optical environments. It exhibits a faster convergence than other networks and a balance between accuracy and speed. It achieves a recognition accuracy of over 96% for almost all types of defects in the actual industrial environment on the NVIDIA Tesla V100 GPU.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-05898-0