Robust Sound Event Classification Using Deep Neural Networks

The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contribu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-03, Vol.23 (3), p.540-552
Hauptverfasser: McLoughlin, Ian, Haomin Zhang, Zhipeng Xie, Yan Song, Wei Xiao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 552
container_issue 3
container_start_page 540
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 23
creator McLoughlin, Ian
Haomin Zhang
Zhipeng Xie
Yan Song
Wei Xiao
description The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.
doi_str_mv 10.1109/TASLP.2015.2389618
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2389618</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7003973</ieee_id><sourcerecordid>1677923070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</originalsourceid><addsrcrecordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1660111936</pqid></control><display><type>article</type><title>Robust Sound Event Classification Using Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creator><creatorcontrib>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creatorcontrib><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2389618</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Auditory event detection ; Auditory system ; Automation ; Classification ; Computer simulation ; Ears &amp; hearing ; Feature extraction ; machine hearing ; Mathematical models ; Neural networks ; Noise ; Sound ; Spectrogram ; Speech ; Speech processing ; Support vector machines ; Tasks ; Vectors</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</citedby><cites>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><title>Robust Sound Event Classification Using Deep Neural Networks</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><subject>Auditory event detection</subject><subject>Auditory system</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer simulation</subject><subject>Ears &amp; hearing</subject><subject>Feature extraction</subject><subject>machine hearing</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Sound</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Support vector machines</subject><subject>Tasks</subject><subject>Vectors</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>McLoughlin, Ian</creator><creator>Haomin Zhang</creator><creator>Zhipeng Xie</creator><creator>Yan Song</creator><creator>Wei Xiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201503</creationdate><title>Robust Sound Event Classification Using Deep Neural Networks</title><author>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Auditory event detection</topic><topic>Auditory system</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer simulation</topic><topic>Ears &amp; hearing</topic><topic>Feature extraction</topic><topic>machine hearing</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Sound</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Support vector machines</topic><topic>Tasks</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>McLoughlin, Ian</au><au>Haomin Zhang</au><au>Zhipeng Xie</au><au>Yan Song</au><au>Wei Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Sound Event Classification Using Deep Neural Networks</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-03</date><risdate>2015</risdate><volume>23</volume><issue>3</issue><spage>540</spage><epage>552</epage><pages>540-552</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2389618</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552
issn 2329-9290
2329-9304
language eng
recordid cdi_crossref_primary_10_1109_TASLP_2015_2389618
source IEEE Electronic Library (IEL)
subjects Auditory event detection
Auditory system
Automation
Classification
Computer simulation
Ears & hearing
Feature extraction
machine hearing
Mathematical models
Neural networks
Noise
Sound
Spectrogram
Speech
Speech processing
Support vector machines
Tasks
Vectors
title Robust Sound Event Classification Using Deep Neural Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A01%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Sound%20Event%20Classification%20Using%20Deep%20Neural%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=McLoughlin,%20Ian&rft.date=2015-03&rft.volume=23&rft.issue=3&rft.spage=540&rft.epage=552&rft.pages=540-552&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2389618&rft_dat=%3Cproquest_RIE%3E1677923070%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1660111936&rft_id=info:pmid/&rft_ieee_id=7003973&rfr_iscdi=true