The Benefit Of Temporally-Strong Labels In Audio Event Classification
To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (~0.1 sec resolution) "strong" labels for a portion of the AudioSet dataset. We devised a temporally strong evaluation set (including explicit negatives of varying difficulty) and a smal...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To reveal the importance of temporal precision in ground truth audio event
labels, we collected precise (~0.1 sec resolution) "strong" labels for a
portion of the AudioSet dataset. We devised a temporally strong evaluation set
(including explicit negatives of varying difficulty) and a small strong-labeled
training subset of 67k clips (compared to the original dataset's 1.8M clips
labeled at 10 sec resolution). We show that fine-tuning with a mix of weak and
strongly labeled data can substantially improve classifier performance, even
when evaluated using only the original weak labels. For a ResNet50
architecture, d' on the strong evaluation data including explicit negatives
improves from 1.13 to 1.41. The new labels are available as an update to
AudioSet. |
---|---|
DOI: | 10.48550/arxiv.2105.07031 |