Unsupervised Visual-to-Geometric Feature Reconstruction for Vision-Based Industrial Anomaly Detection

Industrial anomaly detection involves identifying abnormal regions in products and plays a crucial role in quality inspection. While 2D image-based anomaly detection has been extensively explored, combining two-dimensional (2D) images with three-dimensional (3D) point clouds remains less studied. Ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2025, Vol.13, p.3667-3682
Hauptverfasser: Hoang, Dinh-Cuong, Tan, Phan Xuan, Nguyen, Anh-Nhat, Tran, Duc-Thanh, Duong, Van-Hiep, Mai, Anh-Truong, Pham, Duc-Long, Phan, Khanh-Toan, Do, Minh-Quang, Duong, Ta Huu Anh, Huynh, Tuan-Minh, Bui, Son-Anh, Nguyen, Duc-Manh, Trinh, Viet-Anh, Tran, Khanh-Duong, Nguyen, Thu-Uyen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Industrial anomaly detection involves identifying abnormal regions in products and plays a crucial role in quality inspection. While 2D image-based anomaly detection has been extensively explored, combining two-dimensional (2D) images with three-dimensional (3D) point clouds remains less studied. Existing multimodal methods often combine features from different modalities, leading to feature interference and degraded performance. To overcome this, we propose a novel framework for unsupervised industrial anomaly detection that leverages both visual and geometric information. Specifically, we use pre-trained 2D and 3D models to extract visual features from color images and geometric features from 3D point clouds. Instead of directly fusing these features, we propose a geometric feature reconstruction network that predicts 3D geometric features from the 2D visual features. During training, we minimize the difference between the predicted geometric features and the extracted geometric features, enabling the model to learn how 2D appearance correlates with 3D structure in anomaly-free images. During inference, this learned relationship allows the model to detect anomalies: significant discrepancies between the reconstructed and actual geometric features indicate abnormal regions. Evaluated on the MVTec 3D-AD dataset, our method achieves state-of-the-art performance with an average image-level AUROC score of 0.968, surpassing previous approaches. Additionally, it provides fast inference at 8.2 frames per second with a memory footprint of only 1045 MB, making it highly efficient for industrial applications.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3525567