Seeing Sound: Investigating the Effects of Visualizations and Complexity on Crowdsourced Audio Annotations

Audio annotation is key to developing machine-listening systems; yet, effective ways to accurately and rapidly obtain crowdsourced audio annotations is understudied. In this work, we seek to quantify the reliability/redundancy trade-off in crowdsourced soundscape annotation, investigate how visualiz...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ACM on human-computer interaction 2017-12, Vol.1 (CSCW), p.1-21, Article 29
Hauptverfasser:	Cartwright, Mark, Seals, Ayanna, Salamon, Justin, Williams, Alex, Mikloska, Stefanie, MacConnell, Duncan, Law, Edith, Bello, Juan P., Nov, Oded
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied computing Arts and humanities Collaborative and social computing Collaborative and social computing theory, concepts and paradigms Computer supported cooperative work Empirical studies in collaborative and social computing Empirical studies in visualization Human-centered computing Information retrieval Information systems Multimedia and multimodal retrieval Sound and music computing Specialized information retrieval Speech / audio search Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Audio annotation is key to developing machine-listening systems; yet, effective ways to accurately and rapidly obtain crowdsourced audio annotations is understudied. In this work, we seek to quantify the reliability/redundancy trade-off in crowdsourced soundscape annotation, investigate how visualizations affect accuracy and efficiency, and characterize how performance varies as a function of audio characteristics. Using a controlled experiment, we varied sound visualizations and the complexity of soundscapes presented to human annotators. Results show that more complex audio scenes result in lower annotator agreement, and spectrogram visualizations are superior in producing higher quality annotations at lower cost of time and human labor. We also found recall is more affected than precision by soundscape complexity, and mistakes can be often attributed to certain sound event characteristics. These findings have implications not only for how we should design annotation tasks and interfaces for audio data, but also how we train and evaluate machine-listening systems.
ISSN:	2573-0142 2573-0142
DOI:	10.1145/3134664