A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2021-06, Vol.11 (11), p.4880, Article 4880
Hauptverfasser: Copiaco, Abigail, Ritz, Christian, Abdulaziz, Nidhal, Fasciani, Stefano
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 11
container_start_page 4880
container_title Applied sciences
container_volume 11
creator Copiaco, Abigail
Ritz, Christian
Abdulaziz, Nidhal
Fasciani, Stefano
description Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.
doi_str_mv 10.3390/app11114880
format Article
fullrecord <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000659620000001CitationCount</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48</doaj_id><sourcerecordid>2635406637</sourcerecordid><originalsourceid>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</originalsourceid><addsrcrecordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2635406637</pqid></control><display><type>article</type><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><source>DOAJ Directory of Open Access Journals</source><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creator><creatorcontrib>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</creatorcontrib><description>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app11114880</identifier><language>eng</language><publisher>BASEL: Mdpi</publisher><subject>Acoustics ; Chemistry ; Chemistry, Multidisciplinary ; Classification ; Computer applications ; Computer architecture ; Concept learning ; Engineering ; Engineering, Multidisciplinary ; Experimentation ; Frequency ; Graph theory ; Log-mel ; Materials Science ; Materials Science, Multidisciplinary ; MFCC ; Monitoring systems ; neural network ; Neural networks ; Noise ; Parameter modification ; Physical Sciences ; Physics ; Physics, Applied ; pre-trained models ; scalograms ; Science &amp; Technology ; Signal processing ; Sound ; Technology ; Temporal variations ; Transfer learning ; Wavelet transforms</subject><ispartof>Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>12</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000659620000001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</citedby><cites>FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</cites><orcidid>0000-0002-3768-7569 ; 0000-0001-5555-3225 ; 0000-0003-2542-8604</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,865,2103,2115,27929,27930,39263</link.rule.ids></links><search><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><title>Applied sciences</title><addtitle>APPL SCI-BASEL</addtitle><description>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</description><subject>Acoustics</subject><subject>Chemistry</subject><subject>Chemistry, Multidisciplinary</subject><subject>Classification</subject><subject>Computer applications</subject><subject>Computer architecture</subject><subject>Concept learning</subject><subject>Engineering</subject><subject>Engineering, Multidisciplinary</subject><subject>Experimentation</subject><subject>Frequency</subject><subject>Graph theory</subject><subject>Log-mel</subject><subject>Materials Science</subject><subject>Materials Science, Multidisciplinary</subject><subject>MFCC</subject><subject>Monitoring systems</subject><subject>neural network</subject><subject>Neural networks</subject><subject>Noise</subject><subject>Parameter modification</subject><subject>Physical Sciences</subject><subject>Physics</subject><subject>Physics, Applied</subject><subject>pre-trained models</subject><subject>scalograms</subject><subject>Science &amp; Technology</subject><subject>Signal processing</subject><subject>Sound</subject><subject>Technology</subject><subject>Temporal variations</subject><subject>Transfer learning</subject><subject>Wavelet transforms</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>DOA</sourceid><recordid>eNqNkVFvFCEUhSeNJjZtn_wDJD6asTAwwDxuptY2adREfSZ34VJZd4cRmDT776W7TdtHz8sl5ONw4DTNe0Y_cT7QS5hnViW0pifNaUeVbLlg6s2r9bvmIucNrRoY14yeNvOK_CiL25PoyTVCWRJmApMjV4gz-YpLgm0d5SGmP2SV7O9Q0L5QN_sZU_sdEuywYMrEx0Su4g5zCZasFhciGbeQc_DBQglxOm_eethmvHiaZ82v688_x5v27tuX23F111ouRWm94tZRIZyEjvYUe9BMSypA0W7dS8874eUAg6aCSamddLL3wDs1MGU7hfysuT36uggbM6ewg7Q3EYI5bMR0byDVkFs0XlvnpKdyvbaCARs4s77vB6n8YFHo6vXh6DWn-HepbzObuKSpxjed5L2gUnJVqY9HyqaYc0L_fCuj5rEh86qhF_oB19FnG3Cy-HyiNiRrgI4exCqt_58eQzl89RiXqfB_JKajCw</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Copiaco, Abigail</creator><creator>Ritz, Christian</creator><creator>Abdulaziz, Nidhal</creator><creator>Fasciani, Stefano</creator><general>Mdpi</general><general>MDPI AG</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid></search><sort><creationdate>20210601</creationdate><title>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</title><author>Copiaco, Abigail ; Ritz, Christian ; Abdulaziz, Nidhal ; Fasciani, Stefano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c364t-f73cd044d6a2050e5a818604a702b56f324f69a98041668d6d65fa327917c27e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustics</topic><topic>Chemistry</topic><topic>Chemistry, Multidisciplinary</topic><topic>Classification</topic><topic>Computer applications</topic><topic>Computer architecture</topic><topic>Concept learning</topic><topic>Engineering</topic><topic>Engineering, Multidisciplinary</topic><topic>Experimentation</topic><topic>Frequency</topic><topic>Graph theory</topic><topic>Log-mel</topic><topic>Materials Science</topic><topic>Materials Science, Multidisciplinary</topic><topic>MFCC</topic><topic>Monitoring systems</topic><topic>neural network</topic><topic>Neural networks</topic><topic>Noise</topic><topic>Parameter modification</topic><topic>Physical Sciences</topic><topic>Physics</topic><topic>Physics, Applied</topic><topic>pre-trained models</topic><topic>scalograms</topic><topic>Science &amp; Technology</topic><topic>Signal processing</topic><topic>Sound</topic><topic>Technology</topic><topic>Temporal variations</topic><topic>Transfer learning</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Copiaco, Abigail</creatorcontrib><creatorcontrib>Ritz, Christian</creatorcontrib><creatorcontrib>Abdulaziz, Nidhal</creatorcontrib><creatorcontrib>Fasciani, Stefano</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Copiaco, Abigail</au><au>Ritz, Christian</au><au>Abdulaziz, Nidhal</au><au>Fasciani, Stefano</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification</atitle><jtitle>Applied sciences</jtitle><stitle>APPL SCI-BASEL</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>11</volume><issue>11</issue><spage>4880</spage><pages>4880-</pages><artnum>4880</artnum><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>Featured Application The algorithms explored in this research can be used for any multi-level classification applications. Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.</abstract><cop>BASEL</cop><pub>Mdpi</pub><doi>10.3390/app11114880</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-3768-7569</orcidid><orcidid>https://orcid.org/0000-0001-5555-3225</orcidid><orcidid>https://orcid.org/0000-0003-2542-8604</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2076-3417
ispartof Applied sciences, 2021-06, Vol.11 (11), p.4880, Article 4880
issn 2076-3417
2076-3417
language eng
recordid cdi_webofscience_primary_000659620000001CitationCount
source DOAJ Directory of Open Access Journals; MDPI - Multidisciplinary Digital Publishing Institute; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals
subjects Acoustics
Chemistry
Chemistry, Multidisciplinary
Classification
Computer applications
Computer architecture
Concept learning
Engineering
Engineering, Multidisciplinary
Experimentation
Frequency
Graph theory
Log-mel
Materials Science
Materials Science, Multidisciplinary
MFCC
Monitoring systems
neural network
Neural networks
Noise
Parameter modification
Physical Sciences
Physics
Physics, Applied
pre-trained models
scalograms
Science & Technology
Signal processing
Sound
Technology
Temporal variations
Transfer learning
Wavelet transforms
title A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T06%3A26%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Study%20of%20Features%20and%20Deep%20Neural%20Network%20Architectures%20and%20Hyper-Parameters%20for%20Domestic%20Audio%20Classification&rft.jtitle=Applied%20sciences&rft.au=Copiaco,%20Abigail&rft.date=2021-06-01&rft.volume=11&rft.issue=11&rft.spage=4880&rft.pages=4880-&rft.artnum=4880&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app11114880&rft_dat=%3Cproquest_webof%3E2635406637%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2635406637&rft_id=info:pmid/&rft_doaj_id=oai_doaj_org_article_f8cdd6f06bbc41a1931cf55967f9ce48&rfr_iscdi=true