First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation
First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the targe...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new
task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for
the target machine types are unseen in training. Existing methods often rely on
the availability of normal and abnormal sound data from the target machines.
However, due to the lack of anomalous sound data for the target machine types,
it becomes challenging when adapting the existing ASD methods to the first-shot
task. In this paper, we propose a new framework for the first-shot unsupervised
ASD, where metadata-assisted audio generation is used to estimate unknown
anomalies, by utilising the available machine information (i.e., metadata and
sound data) to fine-tune a text-to-audio generation model for generating the
anomalous sounds that contain unique acoustic characteristics accounting for
each different machine type. We then use the method of Time-Weighted Frequency
domain audio Representation with Gaussian Mixture Model (TWFR-GMM) as the
backbone to achieve the first-shot unsupervised ASD. Our proposed FS-TWFR-GMM
method achieves competitive performance amongst top systems in DCASE 2023
Challenge Task 2, while requiring only 1% model parameters for detection, as
validated in our experiments. |
---|---|
DOI: | 10.48550/arxiv.2310.14173 |