Vision Transformer in Industrial Visual Inspection

Artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a potential paradigm shift and have the potential to facilitate an automated visual inspection, even under complex en...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2022-12, Vol.12 (23), p.11981
Hauptverfasser:	Hütten, Nils, Meyes, Richard, Meisen, Tobias
Format:	Artikel
Sprache:	eng
Schlagworte:	Approximation Artificial intelligence attention mechanism automated industrial visual inspection Back propagation Computer vision Damage assessment Datasets Deep learning defect detection Environmental conditions Freight cars Image classification Image processing Industrial applications Inspection Locomotives Natural language processing Neural networks Task complexity vision transformer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a potential paradigm shift and have the potential to facilitate an automated visual inspection, even under complex environmental conditions. Thereby, convolutional neural networks (CNN) have been the de facto standard in deep-learning-based computer vision (CV) for the last 10 years. Recently, attention-based vision transformer architectures emerged and surpassed the performance of CNNs on benchmark datasets, regarding regular CV tasks, such as image classification, object detection, or segmentation. Nevertheless, despite their outstanding results, the application of vision transformers to real world visual inspection is sparse. We suspect that this is likely due to the assumption that they require enormous amounts of data to be effective. In this study, we evaluate this assumption. For this, we perform a systematic comparison of seven widely-used state-of-the-art CNN and transformer based architectures trained in three different use cases in the domain of visual damage assessment for railway freight car maintenance. We show that vision transformer models achieve at least equivalent performance to CNNs in industrial applications with sparse data available, and significantly surpass them in increasingly complex tasks.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app122311981