A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the...
Gespeichert in:
Veröffentlicht in: | Applied sciences 2021-06, Vol.11 (11), p.4880, Article 4880 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 11 |
container_start_page | 4880 |
container_title | Applied sciences |
container_volume | 11 |
creator | Copiaco, Abigail Ritz, Christian Abdulaziz, Nidhal Fasciani, Stefano |
description | Featured Application
The algorithms explored in this research can be used for any multi-level classification applications.
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB. |
doi_str_mv | 10.3390/app11114880 |
format | Article |
fullrecord | <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000659620000001CitationCount</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48</doaj_id><sourcerecordid>2635406637</sourcerecordid><originalsourceid>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</originalsourceid><addsrcrecordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2635406637</pqid></control><display><type>article</type><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><source>DOAJ Directory of Open Access Journals</source><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creator><creatorcontrib>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creatorcontrib><description>Featured Application
The algorithms explored in this research can be used for any multi-level classification applications.
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app11114880</identifier><language>eng</language><publisher>BASEL: Mdpi</publisher><subject>Acoustics ; Chemistry ; Chemistry, Multidisciplinary ; Classification ; Computer applications ; Computer architecture ; Concept learning ; Engineering ; Engineering, Multidisciplinary ; Experimentation ; Frequency ; Graph theory ; Log-mel ; Materials Science ; Materials Science, Multidisciplinary ; MFCC ; Monitoring systems ; neural network ; Neural networks ; Noise ; Parameter modification ; Physical Sciences ; Physics ; Physics, Applied ; pre-trained models ; scalograms ; Science & Technology ; Signal processing ; Sound ; Technology ; Temporal variations ; Transfer learning ; Wavelet transforms</subject><ispartof>Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>12</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000659620000001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</citedby><cites>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</cites><orcidid>0000-0002-3768-7569 ; 0000-0001-5555-3225 ; 0000-0003-2542-8604</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,865,2103,2115,27929,27930,39263</link.rule.ids></links><search><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><title>Applied sciences</title><addtitle>APPL SCI-BASEL</addtitle><description>Featured Application
The algorithms explored in this research can be used for any multi-level classification applications.
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><subject>Acoustics</subject><subject>Chemistry</subject><subject>Chemistry, Multidisciplinary</subject><subject>Classification</subject><subject>Computer applications</subject><subject>Computer architecture</subject><subject>Concept learning</subject><subject>Engineering</subject><subject>Engineering, Multidisciplinary</subject><subject>Experimentation</subject><subject>Frequency</subject><subject>Graph theory</subject><subject>Log-mel</subject><subject>Materials Science</subject><subject>Materials Science, Multidisciplinary</subject><subject>MFCC</subject><subject>Monitoring systems</subject><subject>neural network</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Parameter modification</subject><subject>Physical Sciences</subject><subject>Physics</subject><subject>Physics, Applied</subject><subject>pre-trained models</subject><subject>scalograms</subject><subject>Science & Technology</subject><subject>Signal processing</subject><subject>Sound</subject><subject>Technology</subject><subject>Temporal variations</subject><subject>Transfer learning</subject><subject>Wavelet transforms</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>DOA</sourceid><recordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Copiaco, Abigail</creator><creator>Ritz, Christian</creator><creator>Abdulaziz, Nidhal</creator><creator>Fasciani, Stefano</creator><general>Mdpi</general><general>MDPI AG</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid></search><sort><creationdate>20210601</creationdate><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><author>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustics</topic><topic>Chemistry</topic><topic>Chemistry, Multidisciplinary</topic><topic>Classification</topic><topic>Computer applications</topic><topic>Computer architecture</topic><topic>Concept learning</topic><topic>Engineering</topic><topic>Engineering, Multidisciplinary</topic><topic>Experimentation</topic><topic>Frequency</topic><topic>Graph theory</topic><topic>Log-mel</topic><topic>Materials Science</topic><topic>Materials Science, Multidisciplinary</topic><topic>MFCC</topic><topic>Monitoring systems</topic><topic>neural network</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Parameter modification</topic><topic>Physical Sciences</topic><topic>Physics</topic><topic>Physics, Applied</topic><topic>pre-trained models</topic><topic>scalograms</topic><topic>Science & Technology</topic><topic>Signal processing</topic><topic>Sound</topic><topic>Technology</topic><topic>Temporal variations</topic><topic>Transfer learning</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Copiaco, Abigail</au><au>Ritz, Christian</au><au>Abdulaziz, Nidhal</au><au>Fasciani, Stefano</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</atitle><jtitle>Applied sciences</jtitle><stitle>APPL SCI-BASEL</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>11</volume><issue>11</issue><spage>4880</spage><pages>4880-</pages><artnum>4880</artnum><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>Featured Application
The algorithms explored in this research can be used for any multi-level classification applications.
Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</abstract><cop>BASEL</cop><pub>Mdpi</pub><doi>10.3390/app11114880</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2076-3417 |
ispartof | Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880 |
issn | 2076-3417 2076-3417 |
language | eng |
recordid | cdi_webofscience_primary_000659620000001CitationCount |
source | DOAJ Directory of Open Access Journals; MDPI - Multidisciplinary Digital Publishing Institute; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals |
subjects | Acoustics Chemistry Chemistry, Multidisciplinary Classification Computer applications Computer architecture Concept learning Engineering Engineering, Multidisciplinary Experimentation Frequency Graph theory Log-mel Materials Science Materials Science, Multidisciplinary MFCC Monitoring systems neural network Neural networks Noise Parameter modification Physical Sciences Physics Physics, Applied pre-trained models scalograms Science & Technology Signal processing Sound Technology Temporal variations Transfer learning Wavelet transforms |
title | A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T06%3A26%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Study%20of%20Features%20and%20Deep%20Neural%20Network%20Architectures%20and%20Hyper-Parameters%20for%20Domestic%20Audio%20Classification&rft.jtitle=Applied%20sciences&rft.au=Copiaco,%20Abigail&rft.date=2021-06-01&rft.volume=11&rft.issue=11&rft.spage=4880&rft.pages=4880-&rft.artnum=4880&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app11114880&rft_dat=%3Cproquest_webof%3E2635406637%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2635406637&rft_id=info:pmid/&rft_doaj_id=oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48&rfr_iscdi=true |