A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2021-06, Vol.11 (11), p.4880, Article 4880
Hauptverfasser:	Copiaco, Abigail, Ritz, Christian, Abdulaziz, Nidhal, Fasciani, Stefano
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Chemistry Chemistry, Multidisciplinary Classification Computer applications Computer architecture Concept learning Engineering Engineering, Multidisciplinary Experimentation Frequency Graph theory Log-mel Materials Science Materials Science, Multidisciplinary MFCC Monitoring systems neural network Neural networks Noise Parameter modification Physical Sciences Physics Physics, Applied pre-trained models scalograms Science & Technology Signal processing Sound Technology Temporal variations Transfer learning Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	11
container_start_page	4880
container_title	Applied sciences
container_volume	11
creator	Copiaco, Abigail Ritz, Christian Abdulaziz, Nidhal Fasciani, Stefano
description	Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.
doi_str_mv	10.3390/app11114880
format	Article
fullrecord	<record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000659620000001CitationCount</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48</doaj_id><sourcerecordid>2635406637</sourcerecordid><originalsourceid>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</originalsourceid><addsrcrecordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2635406637</pqid></control><display><type>article</type><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><source>DOAJ Directory of Open Access Journals</source><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creator><creatorcontrib>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creatorcontrib><description>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app11114880</identifier><language>eng</language><publisher>BASEL: Mdpi</publisher><subject>Acoustics ; Chemistry ; Chemistry, Multidisciplinary ; Classification ; Computer applications ; Computer architecture ; Concept learning ; Engineering ; Engineering, Multidisciplinary ; Experimentation ; Frequency ; Graph theory ; Log-mel ; Materials Science ; Materials Science, Multidisciplinary ; MFCC ; Monitoring systems ; neural network ; Neural networks ; Noise ; Parameter modification ; Physical Sciences ; Physics ; Physics, Applied ; pre-trained models ; scalograms ; Science & Technology ; Signal processing ; Sound ; Technology ; Temporal variations ; Transfer learning ; Wavelet transforms</subject><ispartof>Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>12</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000659620000001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</citedby><cites>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</cites><orcidid>0000-0002-3768-7569 ; 0000-0001-5555-3225 ; 0000-0003-2542-8604</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,865,2103,2115,27929,27930,39263</link.rule.ids></links><search><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><title>Applied sciences</title><addtitle>APPL SCI-BASEL</addtitle><description>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><subject>Acoustics</subject><subject>Chemistry</subject><subject>Chemistry, Multidisciplinary</subject><subject>Classification</subject><subject>Computer applications</subject><subject>Computer architecture</subject><subject>Concept learning</subject><subject>Engineering</subject><subject>Engineering, Multidisciplinary</subject><subject>Experimentation</subject><subject>Frequency</subject><subject>Graph theory</subject><subject>Log-mel</subject><subject>Materials Science</subject><subject>Materials Science, Multidisciplinary</subject><subject>MFCC</subject><subject>Monitoring systems</subject><subject>neural network</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Parameter modification</subject><subject>Physical Sciences</subject><subject>Physics</subject><subject>Physics, Applied</subject><subject>pre-trained models</subject><subject>scalograms</subject><subject>Science & Technology</subject><subject>Signal processing</subject><subject>Sound</subject><subject>Technology</subject><subject>Temporal variations</subject><subject>Transfer learning</subject><subject>Wavelet transforms</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>DOA</sourceid><recordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Copiaco, Abigail</creator><creator>Ritz, Christian</creator><creator>Abdulaziz, Nidhal</creator><creator>Fasciani, Stefano</creator><general>Mdpi</general><general>MDPI AG</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid></search><sort><creationdate>20210601</creationdate><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><author>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustics</topic><topic>Chemistry</topic><topic>Chemistry, Multidisciplinary</topic><topic>Classification</topic><topic>Computer applications</topic><topic>Computer architecture</topic><topic>Concept learning</topic><topic>Engineering</topic><topic>Engineering, Multidisciplinary</topic><topic>Experimentation</topic><topic>Frequency</topic><topic>Graph theory</topic><topic>Log-mel</topic><topic>Materials Science</topic><topic>Materials Science, Multidisciplinary</topic><topic>MFCC</topic><topic>Monitoring systems</topic><topic>neural network</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Parameter modification</topic><topic>Physical Sciences</topic><topic>Physics</topic><topic>Physics, Applied</topic><topic>pre-trained models</topic><topic>scalograms</topic><topic>Science & Technology</topic><topic>Signal processing</topic><topic>Sound</topic><topic>Technology</topic><topic>Temporal variations</topic><topic>Transfer learning</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Copiaco, Abigail</au><au>Ritz, Christian</au><au>Abdulaziz, Nidhal</au><au>Fasciani, Stefano</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</atitle><jtitle>Applied sciences</jtitle><stitle>APPL SCI-BASEL</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>11</volume><issue>11</issue><spage>4880</spage><pages>4880-</pages><artnum>4880</artnum><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</abstract><cop>BASEL</cop><pub>Mdpi</pub><doi>10.3390/app11114880</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2076-3417
ispartof	Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880
issn	2076-3417 2076-3417
language	eng
recordid	cdi_webofscience_primary_000659620000001CitationCount
source	DOAJ Directory of Open Access Journals; MDPI - Multidisciplinary Digital Publishing Institute; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals
subjects	Acoustics Chemistry Chemistry, Multidisciplinary Classification Computer applications Computer architecture Concept learning Engineering Engineering, Multidisciplinary Experimentation Frequency Graph theory Log-mel Materials Science Materials Science, Multidisciplinary MFCC Monitoring systems neural network Neural networks Noise Parameter modification Physical Sciences Physics Physics, Applied pre-trained models scalograms Science & Technology Signal processing Sound Technology Temporal variations Transfer learning Wavelet transforms
title	A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T06%3A26%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Study%20of%20Features%20and%20Deep%20Neural%20Network%20Architectures%20and%20Hyper-Parameters%20for%20Domestic%20Audio%20Classification&rft.jtitle=Applied%20sciences&rft.au=Copiaco,%20Abigail&rft.date=2021-06-01&rft.volume=11&rft.issue=11&rft.spage=4880&rft.pages=4880-&rft.artnum=4880&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app11114880&rft_dat=%3Cproquest_webof%3E2635406637%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2635406637&rft_id=info:pmid/&rft_doaj_id=oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48&rfr_iscdi=true