Adapted Spectrogram Transformer for Unsupervised Cross-Domain Acoustic Anomaly Detection

Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Van De Vyver, Gilles, Liu, Zhaoyi, Dolui, Koustabh, Michiels, Sam, Hughes, Danny
Format: Tagungsbericht
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch generally struggle to learn how to robustly reconstruct samples with limited available data. This paper addresses this problem by presenting a method for unsupervised Acoustic Anomaly Detection (AAD) that adapts intermediate embeddings from a pretrained, self-attention-based spectrogram transformer. Transfer learning from a large, successful model offers a solution to learning with limited data by reusing external knowledge. For AAD, this can help to recognize subtle anomalies. This work proposes two method variants that take advantage of Intermediate Feature Embeddings (IFEs) from the Audio Spectrogram Transformer (AST). The first fits a Gaussian Mixture Model (GMM) on the IFEs produced by intermediate layers of the AST. We call this ADIFAST: Anomaly Detection from Intermediate Features extracted from AST. The second uses the IFEs in a different, more effective way by adapting the AST to an AE structure. We call it TELD: Transformer Encoder Linear Decoder network. The relationship between the two method variants is that they both take advantage of the IFEs extracted by the AST. Evaluating TELD on task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge gives an average improvement to the Area Under Curve (AUC) score of 3.9% for binary labeling normal and anomalous samples in the target domain.
ISSN:2640-0103