Robust Sound Event Classification Using Deep Neural Networks

The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contribu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-03, Vol.23 (3), p.540-552
Hauptverfasser:	McLoughlin, Ian, Haomin Zhang, Zhipeng Xie, Yan Song, Wei Xiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Auditory event detection Auditory system Automation Classification Computer simulation Ears & hearing Feature extraction machine hearing Mathematical models Neural networks Noise Sound Spectrogram Speech Speech processing Support vector machines Tasks Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	552
container_issue	3
container_start_page	540
container_title	IEEE/ACM transactions on audio, speech, and language processing
container_volume	23
creator	McLoughlin, Ian Haomin Zhang Zhipeng Xie Yan Song Wei Xiao
description	The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.
doi_str_mv	10.1109/TASLP.2015.2389618
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2389618</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7003973</ieee_id><sourcerecordid>1677923070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</originalsourceid><addsrcrecordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1660111936</pqid></control><display><type>article</type><title>Robust Sound Event Classification Using Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creator><creatorcontrib>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creatorcontrib><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2389618</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Auditory event detection ; Auditory system ; Automation ; Classification ; Computer simulation ; Ears & hearing ; Feature extraction ; machine hearing ; Mathematical models ; Neural networks ; Noise ; Sound ; Spectrogram ; Speech ; Speech processing ; Support vector machines ; Tasks ; Vectors</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</citedby><cites>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><title>Robust Sound Event Classification Using Deep Neural Networks</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><subject>Auditory event detection</subject><subject>Auditory system</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer simulation</subject><subject>Ears & hearing</subject><subject>Feature extraction</subject><subject>machine hearing</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Sound</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Support vector machines</subject><subject>Tasks</subject><subject>Vectors</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>McLoughlin, Ian</creator><creator>Haomin Zhang</creator><creator>Zhipeng Xie</creator><creator>Yan Song</creator><creator>Wei Xiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201503</creationdate><title>Robust Sound Event Classification Using Deep Neural Networks</title><author>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Auditory event detection</topic><topic>Auditory system</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer simulation</topic><topic>Ears & hearing</topic><topic>Feature extraction</topic><topic>machine hearing</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Sound</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Support vector machines</topic><topic>Tasks</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>McLoughlin, Ian</au><au>Haomin Zhang</au><au>Zhipeng Xie</au><au>Yan Song</au><au>Wei Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Sound Event Classification Using Deep Neural Networks</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-03</date><risdate>2015</risdate><volume>23</volume><issue>3</issue><spage>540</spage><epage>552</epage><pages>540-552</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2389618</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2329-9290
ispartof	IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552
issn	2329-9290 2329-9304
language	eng
recordid	cdi_crossref_primary_10_1109_TASLP_2015_2389618
source	IEEE Electronic Library (IEL)
subjects	Auditory event detection Auditory system Automation Classification Computer simulation Ears & hearing Feature extraction machine hearing Mathematical models Neural networks Noise Sound Spectrogram Speech Speech processing Support vector machines Tasks Vectors
title	Robust Sound Event Classification Using Deep Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A01%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Sound%20Event%20Classification%20Using%20Deep%20Neural%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=McLoughlin,%20Ian&rft.date=2015-03&rft.volume=23&rft.issue=3&rft.spage=540&rft.epage=552&rft.pages=540-552&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2389618&rft_dat=%3Cproquest_RIE%3E1677923070%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1660111936&rft_id=info:pmid/&rft_ieee_id=7003973&rfr_iscdi=true