Robust Sound Event Classification Using Deep Neural Networks
The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contribu...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-03, Vol.23 (3), p.540-552 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 552 |
---|---|
container_issue | 3 |
container_start_page | 540 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 23 |
creator | McLoughlin, Ian Haomin Zhang Zhipeng Xie Yan Song Wei Xiao |
description | The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques. |
doi_str_mv | 10.1109/TASLP.2015.2389618 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASLP_2015_2389618</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7003973</ieee_id><sourcerecordid>1677923070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</originalsourceid><addsrcrecordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1660111936</pqid></control><display><type>article</type><title>Robust Sound Event Classification Using Deep Neural Networks</title><source>IEEE Electronic Library (IEL)</source><creator>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creator><creatorcontrib>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</creatorcontrib><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2015.2389618</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Auditory event detection ; Auditory system ; Automation ; Classification ; Computer simulation ; Ears & hearing ; Feature extraction ; machine hearing ; Mathematical models ; Neural networks ; Noise ; Sound ; Spectrogram ; Speech ; Speech processing ; Support vector machines ; Tasks ; Vectors</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</citedby><cites>FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7003973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><title>Robust Sound Event Classification Using Deep Neural Networks</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</description><subject>Auditory event detection</subject><subject>Auditory system</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer simulation</subject><subject>Ears & hearing</subject><subject>Feature extraction</subject><subject>machine hearing</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Sound</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Support vector machines</subject><subject>Tasks</subject><subject>Vectors</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkD1PwzAQhi0EElXpH4AlEgtLytkX7FhiqUr5kCpAtJ0tN7FRShoXOwHx73FpYWC6G573Ph5CTikMKQV5OR_Nps9DBvRqyDCXnOYHpMeQyVQiZIe_PZNwTAYhrACAgpBSZD1y_eKWXWiTmeuaMpl8mKZNxrUOobJVodvKNckiVM1rcmPMJnk0ndd1LO2n82_hhBxZXQcz2Nc-WdxO5uP7dPp09zAeTdMCBWtTykrGJUjMswKZzaGg2ZKhxMJqTo0VywwspZSXaDOhrZEUgSEvyyw-oA32ycVu7sa7986EVq2rUJi61o1xXVCUCyEZgoCInv9DV67zTbwuUhziFok8UmxHFd6F4I1VG1-ttf9SFNTWqfpxqrZO1d5pDJ3tQpUx5i8gAFAKxG_-o3BH</recordid><startdate>201503</startdate><enddate>201503</enddate><creator>McLoughlin, Ian</creator><creator>Haomin Zhang</creator><creator>Zhipeng Xie</creator><creator>Yan Song</creator><creator>Wei Xiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201503</creationdate><title>Robust Sound Event Classification Using Deep Neural Networks</title><author>McLoughlin, Ian ; Haomin Zhang ; Zhipeng Xie ; Yan Song ; Wei Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-12d26909384c32f80c14b2393cfa61ef7b40f1116d3f47afe9130236dd4930ae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Auditory event detection</topic><topic>Auditory system</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer simulation</topic><topic>Ears & hearing</topic><topic>Feature extraction</topic><topic>machine hearing</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Sound</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Support vector machines</topic><topic>Tasks</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>McLoughlin, Ian</creatorcontrib><creatorcontrib>Haomin Zhang</creatorcontrib><creatorcontrib>Zhipeng Xie</creatorcontrib><creatorcontrib>Yan Song</creatorcontrib><creatorcontrib>Wei Xiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>McLoughlin, Ian</au><au>Haomin Zhang</au><au>Zhipeng Xie</au><au>Yan Song</au><au>Wei Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Sound Event Classification Using Deep Neural Networks</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2015-03</date><risdate>2015</risdate><volume>23</volume><issue>3</issue><spage>540</spage><epage>552</epage><pages>540-552</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2015.2389618</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2015-03, Vol.23 (3), p.540-552 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TASLP_2015_2389618 |
source | IEEE Electronic Library (IEL) |
subjects | Auditory event detection Auditory system Automation Classification Computer simulation Ears & hearing Feature extraction machine hearing Mathematical models Neural networks Noise Sound Spectrogram Speech Speech processing Support vector machines Tasks Vectors |
title | Robust Sound Event Classification Using Deep Neural Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A01%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Sound%20Event%20Classification%20Using%20Deep%20Neural%20Networks&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=McLoughlin,%20Ian&rft.date=2015-03&rft.volume=23&rft.issue=3&rft.spage=540&rft.epage=552&rft.pages=540-552&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2015.2389618&rft_dat=%3Cproquest_RIE%3E1677923070%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1660111936&rft_id=info:pmid/&rft_ieee_id=7003973&rfr_iscdi=true |