A Human-in-the-Loop System for Sound Event Detection and Annotation

Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler beca...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on interactive intelligent systems 2018-07, Vol.8 (2), p.1-23
Hauptverfasser:	Kim, Bongjun, Pardo, Bryan
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	23
container_issue	2
container_start_page	1
container_title	ACM transactions on interactive intelligent systems
container_volume	8
creator	Kim, Bongjun Pardo, Bryan
description	Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler because many techniques (e.g., deep learning) require a large number of human-labeled training examples. Also, fully automated labeling may not show sufficient agreement with human labeling for many uses. To solve this issue, we present a human-in-the-loop sound labeling system that helps a user quickly label target sound events in a long audio. It lets a user reduce the time required to label a long audio file (e.g., 20 hours) containing target sounds that are sparsely distributed throughout the recording (10% or less of the audio contains the target) when there are too few labeled examples (e.g., one) to train a state-of-the-art machine audio labeling system. To evaluate the effectiveness of our tool, we performed a human-subject study. The results show that it helped participants label target sound events twice as fast as labeling them manually. In addition to measuring the overall performance of the proposed system, we also measure interaction overhead and machine accuracy, which are two key factors that determine the overall performance. The analysis shows that an ideal interface that does not have interaction overhead at all could speed labeling by as much as a factor of four.
doi_str_mv	10.1145/3214366
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3214366</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3214366</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-31b0ba12459809c8e23b7f964b1ae8f5fa4924a8c5cada0d87f867c23645d3633</originalsourceid><addsrcrecordid>eNo9T09LwzAcDeLAMYdfITdP0SS_JE2PpU4nFDzMnUuaJlixyWgyYd_eDsfe5f05PN5D6IHRJ8aEfAbOBCh1g5acKUqUUHB71VLeoXVK33SGlCChWKK6wtvjaAIZAslfjjQxHvDulLIbsY8T3sVj6PHm14WMX1x2Ng8xYDNnVQgxm7O9RwtvfpJbX3iF9q-bz3pLmo-397pqiOVSZwKso51hXMhS09Jqx6ErfKlEx4zTXnojSi6MttKa3tBeF16rwnKYl_egAFbo8b_XTjGlyfn2MA2jmU4to-35fnu5D3_isEqS</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Human-in-the-Loop System for Sound Event Detection and Annotation</title><source>ACM Digital Library</source><creator>Kim, Bongjun ; Pardo, Bryan</creator><creatorcontrib>Kim, Bongjun ; Pardo, Bryan</creatorcontrib><description>Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler because many techniques (e.g., deep learning) require a large number of human-labeled training examples. Also, fully automated labeling may not show sufficient agreement with human labeling for many uses. To solve this issue, we present a human-in-the-loop sound labeling system that helps a user quickly label target sound events in a long audio. It lets a user reduce the time required to label a long audio file (e.g., 20 hours) containing target sounds that are sparsely distributed throughout the recording (10% or less of the audio contains the target) when there are too few labeled examples (e.g., one) to train a state-of-the-art machine audio labeling system. To evaluate the effectiveness of our tool, we performed a human-subject study. The results show that it helped participants label target sound events twice as fast as labeling them manually. In addition to measuring the overall performance of the proposed system, we also measure interaction overhead and machine accuracy, which are two key factors that determine the overall performance. The analysis shows that an ideal interface that does not have interaction overhead at all could speed labeling by as much as a factor of four.</description><identifier>ISSN: 2160-6455</identifier><identifier>EISSN: 2160-6463</identifier><identifier>DOI: 10.1145/3214366</identifier><language>eng</language><ispartof>ACM transactions on interactive intelligent systems, 2018-07, Vol.8 (2), p.1-23</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-31b0ba12459809c8e23b7f964b1ae8f5fa4924a8c5cada0d87f867c23645d3633</citedby><cites>FETCH-LOGICAL-c258t-31b0ba12459809c8e23b7f964b1ae8f5fa4924a8c5cada0d87f867c23645d3633</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Kim, Bongjun</creatorcontrib><creatorcontrib>Pardo, Bryan</creatorcontrib><title>A Human-in-the-Loop System for Sound Event Detection and Annotation</title><title>ACM transactions on interactive intelligent systems</title><description>Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler because many techniques (e.g., deep learning) require a large number of human-labeled training examples. Also, fully automated labeling may not show sufficient agreement with human labeling for many uses. To solve this issue, we present a human-in-the-loop sound labeling system that helps a user quickly label target sound events in a long audio. It lets a user reduce the time required to label a long audio file (e.g., 20 hours) containing target sounds that are sparsely distributed throughout the recording (10% or less of the audio contains the target) when there are too few labeled examples (e.g., one) to train a state-of-the-art machine audio labeling system. To evaluate the effectiveness of our tool, we performed a human-subject study. The results show that it helped participants label target sound events twice as fast as labeling them manually. In addition to measuring the overall performance of the proposed system, we also measure interaction overhead and machine accuracy, which are two key factors that determine the overall performance. The analysis shows that an ideal interface that does not have interaction overhead at all could speed labeling by as much as a factor of four.</description><issn>2160-6455</issn><issn>2160-6463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNo9T09LwzAcDeLAMYdfITdP0SS_JE2PpU4nFDzMnUuaJlixyWgyYd_eDsfe5f05PN5D6IHRJ8aEfAbOBCh1g5acKUqUUHB71VLeoXVK33SGlCChWKK6wtvjaAIZAslfjjQxHvDulLIbsY8T3sVj6PHm14WMX1x2Ng8xYDNnVQgxm7O9RwtvfpJbX3iF9q-bz3pLmo-397pqiOVSZwKso51hXMhS09Jqx6ErfKlEx4zTXnojSi6MttKa3tBeF16rwnKYl_egAFbo8b_XTjGlyfn2MA2jmU4to-35fnu5D3_isEqS</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Kim, Bongjun</creator><creator>Pardo, Bryan</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180701</creationdate><title>A Human-in-the-Loop System for Sound Event Detection and Annotation</title><author>Kim, Bongjun ; Pardo, Bryan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-31b0ba12459809c8e23b7f964b1ae8f5fa4924a8c5cada0d87f867c23645d3633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Bongjun</creatorcontrib><creatorcontrib>Pardo, Bryan</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on interactive intelligent systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Bongjun</au><au>Pardo, Bryan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Human-in-the-Loop System for Sound Event Detection and Annotation</atitle><jtitle>ACM transactions on interactive intelligent systems</jtitle><date>2018-07-01</date><risdate>2018</risdate><volume>8</volume><issue>2</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><issn>2160-6455</issn><eissn>2160-6463</eissn><abstract>Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler because many techniques (e.g., deep learning) require a large number of human-labeled training examples. Also, fully automated labeling may not show sufficient agreement with human labeling for many uses. To solve this issue, we present a human-in-the-loop sound labeling system that helps a user quickly label target sound events in a long audio. It lets a user reduce the time required to label a long audio file (e.g., 20 hours) containing target sounds that are sparsely distributed throughout the recording (10% or less of the audio contains the target) when there are too few labeled examples (e.g., one) to train a state-of-the-art machine audio labeling system. To evaluate the effectiveness of our tool, we performed a human-subject study. The results show that it helped participants label target sound events twice as fast as labeling them manually. In addition to measuring the overall performance of the proposed system, we also measure interaction overhead and machine accuracy, which are two key factors that determine the overall performance. The analysis shows that an ideal interface that does not have interaction overhead at all could speed labeling by as much as a factor of four.</abstract><doi>10.1145/3214366</doi><tpages>23</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2160-6455
ispartof	ACM transactions on interactive intelligent systems, 2018-07, Vol.8 (2), p.1-23
issn	2160-6455 2160-6463
language	eng
recordid	cdi_crossref_primary_10_1145_3214366
source	ACM Digital Library
title	A Human-in-the-Loop System for Sound Event Detection and Annotation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T13%3A42%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Human-in-the-Loop%20System%20for%20Sound%20Event%20Detection%20and%20Annotation&rft.jtitle=ACM%20transactions%20on%20interactive%20intelligent%20systems&rft.au=Kim,%20Bongjun&rft.date=2018-07-01&rft.volume=8&rft.issue=2&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.issn=2160-6455&rft.eissn=2160-6463&rft_id=info:doi/10.1145/3214366&rft_dat=%3Ccrossref%3E10_1145_3214366%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true