Deep learning based cough detection camera using enhanced features

Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary cl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-05
Hauptverfasser:	Lee, Gyeong-Tae, Nam, Hyeonuk, Seong-Hu, Kim, Sang-Min, Choi, Kim, Youngkey, Yong-Hwa, Park
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration Acoustics Artificial neural networks Audio data Background noise Beamforming Cameras Classifiers Computer Science - Sound Cough Deep learning Sound Sound sources Spectrograms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Lee, Gyeong-Tae Nam, Hyeonuk Seong-Hu, Kim Sang-Min, Choi Kim, Youngkey Yong-Hwa, Park
description	Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences (Cough or Others). Data augmentation was performed on the collected audio files to alleviate class imbalance and reflect various background noises in practical environments. For effective featuring of the cough sound, conventional features such as spectrograms, mel-scaled spectrograms, and mel-frequency cepstral coefficients (MFCC) were reinforced by utilizing their velocity (V) and acceleration (A) maps in this work. VGGNet, GoogLeNet, and ResNet were simplified to binary classifiers, and were named V-net, G-net, and R-net, respectively. To find the best combination of features and networks, training was performed for a total of 39 cases and the performance was confirmed using the test F1 score. Finally, a test F1 score of 91.9% (test accuracy of 97.2%) was achieved from G-net with the MFCC-V-A feature (named Spectroflow), an acoustic feature effective for use in cough detection. The trained cough detection model was integrated with a sound camera (i.e., one that visualizes sound sources using a beamforming microphone array). In a pilot test, the cough detection camera detected coughing sounds with an F1 score of 90.0% (accuracy of 96.0%), and the cough location in the camera image was tracked in real time.
doi_str_mv	10.48550/arxiv.2107.13260
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2107_13260</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2556165143</sourcerecordid><originalsourceid>FETCH-LOGICAL-a523-102e4cf3d30fc61a1f9edd1661fe92fe661213a83afe19bfc64c797d9ba08d363</originalsourceid><addsrcrecordid>eNotz8FOwzAMgOEICYlp7AE4UYlzSxy3aXuEAQNpEpfdKzdxtk5bW5IWwdvTbZzswyfLvxB3IJO0yDL5SP6n-U4UyDwBVFpeiZlChLhIlboRixD2Ukqlc5VlOBPPL8x9dGDybdNuo5oC28h043YXWR7YDE3XRoaO7Ckaw4lwu6PWTMoxDaPncCuuHR0CL_7nXGzeXjfL93j9ufpYPq1jyhTGIBWnxqFF6YwGAleytaA1OC6V42lRgFQgOYaynkxq8jK3ZU2ysKhxLu4vZ8-BVe-bI_nf6hRanUMn8XARve--Rg5Dte9G304_VVOsBp1BivgH4MhV6A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2556165143</pqid></control><display><type>article</type><title>Deep learning based cough detection camera using enhanced features</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Lee, Gyeong-Tae ; Nam, Hyeonuk ; Seong-Hu, Kim ; Sang-Min, Choi ; Kim, Youngkey ; Yong-Hwa, Park</creator><creatorcontrib>Lee, Gyeong-Tae ; Nam, Hyeonuk ; Seong-Hu, Kim ; Sang-Min, Choi ; Kim, Youngkey ; Yong-Hwa, Park</creatorcontrib><description>Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences (Cough or Others). Data augmentation was performed on the collected audio files to alleviate class imbalance and reflect various background noises in practical environments. For effective featuring of the cough sound, conventional features such as spectrograms, mel-scaled spectrograms, and mel-frequency cepstral coefficients (MFCC) were reinforced by utilizing their velocity (V) and acceleration (A) maps in this work. VGGNet, GoogLeNet, and ResNet were simplified to binary classifiers, and were named V-net, G-net, and R-net, respectively. To find the best combination of features and networks, training was performed for a total of 39 cases and the performance was confirmed using the test F1 score. Finally, a test F1 score of 91.9% (test accuracy of 97.2%) was achieved from G-net with the MFCC-V-A feature (named Spectroflow), an acoustic feature effective for use in cough detection. The trained cough detection model was integrated with a sound camera (i.e., one that visualizes sound sources using a beamforming microphone array). In a pilot test, the cough detection camera detected coughing sounds with an F1 score of 90.0% (accuracy of 96.0%), and the cough location in the camera image was tracked in real time.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2107.13260</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acceleration ; Acoustics ; Artificial neural networks ; Audio data ; Background noise ; Beamforming ; Cameras ; Classifiers ; Computer Science - Sound ; Cough ; Deep learning ; Sound ; Sound sources ; Spectrograms</subject><ispartof>arXiv.org, 2022-05</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2107.13260$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1016/j.eswa.2022.117811$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Gyeong-Tae</creatorcontrib><creatorcontrib>Nam, Hyeonuk</creatorcontrib><creatorcontrib>Seong-Hu, Kim</creatorcontrib><creatorcontrib>Sang-Min, Choi</creatorcontrib><creatorcontrib>Kim, Youngkey</creatorcontrib><creatorcontrib>Yong-Hwa, Park</creatorcontrib><title>Deep learning based cough detection camera using enhanced features</title><title>arXiv.org</title><description>Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences (Cough or Others). Data augmentation was performed on the collected audio files to alleviate class imbalance and reflect various background noises in practical environments. For effective featuring of the cough sound, conventional features such as spectrograms, mel-scaled spectrograms, and mel-frequency cepstral coefficients (MFCC) were reinforced by utilizing their velocity (V) and acceleration (A) maps in this work. VGGNet, GoogLeNet, and ResNet were simplified to binary classifiers, and were named V-net, G-net, and R-net, respectively. To find the best combination of features and networks, training was performed for a total of 39 cases and the performance was confirmed using the test F1 score. Finally, a test F1 score of 91.9% (test accuracy of 97.2%) was achieved from G-net with the MFCC-V-A feature (named Spectroflow), an acoustic feature effective for use in cough detection. The trained cough detection model was integrated with a sound camera (i.e., one that visualizes sound sources using a beamforming microphone array). In a pilot test, the cough detection camera detected coughing sounds with an F1 score of 90.0% (accuracy of 96.0%), and the cough location in the camera image was tracked in real time.</description><subject>Acceleration</subject><subject>Acoustics</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Background noise</subject><subject>Beamforming</subject><subject>Cameras</subject><subject>Classifiers</subject><subject>Computer Science - Sound</subject><subject>Cough</subject><subject>Deep learning</subject><subject>Sound</subject><subject>Sound sources</subject><subject>Spectrograms</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotz8FOwzAMgOEICYlp7AE4UYlzSxy3aXuEAQNpEpfdKzdxtk5bW5IWwdvTbZzswyfLvxB3IJO0yDL5SP6n-U4UyDwBVFpeiZlChLhIlboRixD2Ukqlc5VlOBPPL8x9dGDybdNuo5oC28h043YXWR7YDE3XRoaO7Ckaw4lwu6PWTMoxDaPncCuuHR0CL_7nXGzeXjfL93j9ufpYPq1jyhTGIBWnxqFF6YwGAleytaA1OC6V42lRgFQgOYaynkxq8jK3ZU2ysKhxLu4vZ8-BVe-bI_nf6hRanUMn8XARve--Rg5Dte9G304_VVOsBp1BivgH4MhV6A</recordid><startdate>20220524</startdate><enddate>20220524</enddate><creator>Lee, Gyeong-Tae</creator><creator>Nam, Hyeonuk</creator><creator>Seong-Hu, Kim</creator><creator>Sang-Min, Choi</creator><creator>Kim, Youngkey</creator><creator>Yong-Hwa, Park</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220524</creationdate><title>Deep learning based cough detection camera using enhanced features</title><author>Lee, Gyeong-Tae ; Nam, Hyeonuk ; Seong-Hu, Kim ; Sang-Min, Choi ; Kim, Youngkey ; Yong-Hwa, Park</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a523-102e4cf3d30fc61a1f9edd1661fe92fe661213a83afe19bfc64c797d9ba08d363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acceleration</topic><topic>Acoustics</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Background noise</topic><topic>Beamforming</topic><topic>Cameras</topic><topic>Classifiers</topic><topic>Computer Science - Sound</topic><topic>Cough</topic><topic>Deep learning</topic><topic>Sound</topic><topic>Sound sources</topic><topic>Spectrograms</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Gyeong-Tae</creatorcontrib><creatorcontrib>Nam, Hyeonuk</creatorcontrib><creatorcontrib>Seong-Hu, Kim</creatorcontrib><creatorcontrib>Sang-Min, Choi</creatorcontrib><creatorcontrib>Kim, Youngkey</creatorcontrib><creatorcontrib>Yong-Hwa, Park</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Gyeong-Tae</au><au>Nam, Hyeonuk</au><au>Seong-Hu, Kim</au><au>Sang-Min, Choi</au><au>Kim, Youngkey</au><au>Yong-Hwa, Park</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning based cough detection camera using enhanced features</atitle><jtitle>arXiv.org</jtitle><date>2022-05-24</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences (Cough or Others). Data augmentation was performed on the collected audio files to alleviate class imbalance and reflect various background noises in practical environments. For effective featuring of the cough sound, conventional features such as spectrograms, mel-scaled spectrograms, and mel-frequency cepstral coefficients (MFCC) were reinforced by utilizing their velocity (V) and acceleration (A) maps in this work. VGGNet, GoogLeNet, and ResNet were simplified to binary classifiers, and were named V-net, G-net, and R-net, respectively. To find the best combination of features and networks, training was performed for a total of 39 cases and the performance was confirmed using the test F1 score. Finally, a test F1 score of 91.9% (test accuracy of 97.2%) was achieved from G-net with the MFCC-V-A feature (named Spectroflow), an acoustic feature effective for use in cough detection. The trained cough detection model was integrated with a sound camera (i.e., one that visualizes sound sources using a beamforming microphone array). In a pilot test, the cough detection camera detected coughing sounds with an F1 score of 90.0% (accuracy of 96.0%), and the cough location in the camera image was tracked in real time.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2107.13260</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-05
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2107_13260
source	arXiv.org; Free E- Journals
subjects	Acceleration Acoustics Artificial neural networks Audio data Background noise Beamforming Cameras Classifiers Computer Science - Sound Cough Deep learning Sound Sound sources Spectrograms
title	Deep learning based cough detection camera using enhanced features
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A10%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20based%20cough%20detection%20camera%20using%20enhanced%20features&rft.jtitle=arXiv.org&rft.au=Lee,%20Gyeong-Tae&rft.date=2022-05-24&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2107.13260&rft_dat=%3Cproquest_arxiv%3E2556165143%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2556165143&rft_id=info:pmid/&rfr_iscdi=true