Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes

Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2022-11, Vol.17 (11), p.e0278170
Hauptverfasser:	Diehl, Peter Udo, Thorbergsson, Leifur, Singer, Yosef, Skripniuk, Vladislav, Pudszuhn, Annett, Hofmann, Veit M, Sprengel, Elias, Meyer-Rachner, Paul
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Acoustic properties Acoustics Algorithms Analysis Artificial neural networks Benchmarking Benchmarks Biology and Life Sciences Computational linguistics Computer and Information Sciences Computer applications Datasets Deep Learning Engineering and Technology Hearing aids Humans Language processing Machine learning Natural language interfaces Neural networks Physical Sciences Ratings Ratings & rankings Social Sciences Sound Speech Video communication Videoconferencing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	11
container_start_page	e0278170
container_title	PloS one
container_volume	17
creator	Diehl, Peter Udo Thorbergsson, Leifur Singer, Yosef Skripniuk, Vladislav Pudszuhn, Annett Hofmann, Veit M Sprengel, Elias Meyer-Rachner, Paul
description	Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.
doi_str_mv	10.1371/journal.pone.0278170
format	Article
fullrecord	<record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2740840683</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A728146328</galeid><doaj_id>oai_doaj_org_article_d1552afcbfb44ab5bc055257428b4747</doaj_id><sourcerecordid>A728146328</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</originalsourceid><addsrcrecordid>eNqNk8tq3DAUhk1padK0b1BaQaG0C08lWbLkTSGEXgZCA71thSTLtoLHciQ5Td6-mhknjEsWxQuZo-_cfp2TZS8RXKGCoQ-XbvKD7FejG8wKYsYRg4-yY1QVOC8xLB4f_B9lz0K4hJAWvCyfZkdFSQhiCB1nN9_ckNsh-inYawNqY0bQG-kHO7S5ksHUQLvNOEUZrUvpQBiN0R3YmOitDuCPjR3obNvlUuvJS30LpPYuBCDTXW2Al0NrgGuS2U0hWg2CNoMJz7MnjeyDeTGfJ9mvz59-nn3Nzy--rM9Oz3NdVjjmWBEMEeeMKYoZokZXVFOoDNRYMa4VbmipC4pSZ4SUBayg5lzKChGCqyTASfZ6H3fsXRCzaEFgRiAnsORFItZ7onbyUozebqS_FU5asTM43wrpU-G9ETWiFMtGq0YRIhVVGiYDZQRzRRhhKdbHOdukNqZOnUYv-0XQ5c1gO9G6a1ExSCjZlvtuDuDd1WRCFBubBOt7OZik367uCsES0YS--Qd9uLuZamVqwA6NS3n1Nqg4ZZijpBnmiVo9QKWvNhur04Q1NtkXDu8XDomJ5ia2cgpBrH98_3_24veSfXvAdkb2sQuun7bTF5Yg2YO7afOmuRcZQbFdkDs1xHZBxLwgye3V4QPdO91tRPEXZMoKyw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2740840683</pqid></control><display><type>article</type><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Public Library of Science (PLoS)</source><creator>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</creator><creatorcontrib>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</creatorcontrib><description>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0278170</identifier><identifier>PMID: 36441711</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Acoustic properties ; Acoustics ; Algorithms ; Analysis ; Artificial neural networks ; Benchmarking ; Benchmarks ; Biology and Life Sciences ; Computational linguistics ; Computer and Information Sciences ; Computer applications ; Datasets ; Deep Learning ; Engineering and Technology ; Hearing aids ; Humans ; Language processing ; Machine learning ; Natural language interfaces ; Neural networks ; Physical Sciences ; Ratings ; Ratings & rankings ; Social Sciences ; Sound ; Speech ; Video communication ; Videoconferencing</subject><ispartof>PloS one, 2022-11, Vol.17 (11), p.e0278170</ispartof><rights>Copyright: © 2022 Diehl et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Diehl et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Diehl et al 2022 Diehl et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</citedby><cites>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</cites><orcidid>0000-0001-6683-3011 ; 0000-0002-7316-2444</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9704549/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9704549/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79342,79343</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36441711$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Diehl, Peter Udo</creatorcontrib><creatorcontrib>Thorbergsson, Leifur</creatorcontrib><creatorcontrib>Singer, Yosef</creatorcontrib><creatorcontrib>Skripniuk, Vladislav</creatorcontrib><creatorcontrib>Pudszuhn, Annett</creatorcontrib><creatorcontrib>Hofmann, Veit M</creatorcontrib><creatorcontrib>Sprengel, Elias</creatorcontrib><creatorcontrib>Meyer-Rachner, Paul</creatorcontrib><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</description><subject>Accuracy</subject><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial neural networks</subject><subject>Benchmarking</subject><subject>Benchmarks</subject><subject>Biology and Life Sciences</subject><subject>Computational linguistics</subject><subject>Computer and Information Sciences</subject><subject>Computer applications</subject><subject>Datasets</subject><subject>Deep Learning</subject><subject>Engineering and Technology</subject><subject>Hearing aids</subject><subject>Humans</subject><subject>Language processing</subject><subject>Machine learning</subject><subject>Natural language interfaces</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Ratings</subject><subject>Ratings & rankings</subject><subject>Social Sciences</subject><subject>Sound</subject><subject>Speech</subject><subject>Video communication</subject><subject>Videoconferencing</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk8tq3DAUhk1padK0b1BaQaG0C08lWbLkTSGEXgZCA71thSTLtoLHciQ5Td6-mhknjEsWxQuZo-_cfp2TZS8RXKGCoQ-XbvKD7FejG8wKYsYRg4-yY1QVOC8xLB4f_B9lz0K4hJAWvCyfZkdFSQhiCB1nN9_ckNsh-inYawNqY0bQG-kHO7S5ksHUQLvNOEUZrUvpQBiN0R3YmOitDuCPjR3obNvlUuvJS30LpPYuBCDTXW2Al0NrgGuS2U0hWg2CNoMJz7MnjeyDeTGfJ9mvz59-nn3Nzy--rM9Oz3NdVjjmWBEMEeeMKYoZokZXVFOoDNRYMa4VbmipC4pSZ4SUBayg5lzKChGCqyTASfZ6H3fsXRCzaEFgRiAnsORFItZ7onbyUozebqS_FU5asTM43wrpU-G9ETWiFMtGq0YRIhVVGiYDZQRzRRhhKdbHOdukNqZOnUYv-0XQ5c1gO9G6a1ExSCjZlvtuDuDd1WRCFBubBOt7OZik367uCsES0YS--Qd9uLuZamVqwA6NS3n1Nqg4ZZijpBnmiVo9QKWvNhur04Q1NtkXDu8XDomJ5ia2cgpBrH98_3_24veSfXvAdkb2sQuun7bTF5Yg2YO7afOmuRcZQbFdkDs1xHZBxLwgye3V4QPdO91tRPEXZMoKyw</recordid><startdate>20221128</startdate><enddate>20221128</enddate><creator>Diehl, Peter Udo</creator><creator>Thorbergsson, Leifur</creator><creator>Singer, Yosef</creator><creator>Skripniuk, Vladislav</creator><creator>Pudszuhn, Annett</creator><creator>Hofmann, Veit M</creator><creator>Sprengel, Elias</creator><creator>Meyer-Rachner, Paul</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6683-3011</orcidid><orcidid>https://orcid.org/0000-0002-7316-2444</orcidid></search><sort><creationdate>20221128</creationdate><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><author>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial neural networks</topic><topic>Benchmarking</topic><topic>Benchmarks</topic><topic>Biology and Life Sciences</topic><topic>Computational linguistics</topic><topic>Computer and Information Sciences</topic><topic>Computer applications</topic><topic>Datasets</topic><topic>Deep Learning</topic><topic>Engineering and Technology</topic><topic>Hearing aids</topic><topic>Humans</topic><topic>Language processing</topic><topic>Machine learning</topic><topic>Natural language interfaces</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Ratings</topic><topic>Ratings & rankings</topic><topic>Social Sciences</topic><topic>Sound</topic><topic>Speech</topic><topic>Video communication</topic><topic>Videoconferencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Diehl, Peter Udo</creatorcontrib><creatorcontrib>Thorbergsson, Leifur</creatorcontrib><creatorcontrib>Singer, Yosef</creatorcontrib><creatorcontrib>Skripniuk, Vladislav</creatorcontrib><creatorcontrib>Pudszuhn, Annett</creatorcontrib><creatorcontrib>Hofmann, Veit M</creatorcontrib><creatorcontrib>Sprengel, Elias</creatorcontrib><creatorcontrib>Meyer-Rachner, Paul</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Diehl, Peter Udo</au><au>Thorbergsson, Leifur</au><au>Singer, Yosef</au><au>Skripniuk, Vladislav</au><au>Pudszuhn, Annett</au><au>Hofmann, Veit M</au><au>Sprengel, Elias</au><au>Meyer-Rachner, Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2022-11-28</date><risdate>2022</risdate><volume>17</volume><issue>11</issue><spage>e0278170</spage><pages>e0278170-</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>36441711</pmid><doi>10.1371/journal.pone.0278170</doi><tpages>e0278170</tpages><orcidid>https://orcid.org/0000-0001-6683-3011</orcidid><orcidid>https://orcid.org/0000-0002-7316-2444</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1932-6203
ispartof	PloS one, 2022-11, Vol.17 (11), p.e0278170
issn	1932-6203 1932-6203
language	eng
recordid	cdi_plos_journals_2740840683
source	MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Free Full-Text Journals in Chemistry; Public Library of Science (PLoS)
subjects	Accuracy Acoustic properties Acoustics Algorithms Analysis Artificial neural networks Benchmarking Benchmarks Biology and Life Sciences Computational linguistics Computer and Information Sciences Computer applications Datasets Deep Learning Engineering and Technology Hearing aids Humans Language processing Machine learning Natural language interfaces Neural networks Physical Sciences Ratings Ratings & rankings Social Sciences Sound Speech Video communication Videoconferencing
title	Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T12%3A21%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Non-intrusive%20deep%20learning-based%20computational%20speech%20metrics%20with%20high-accuracy%20across%20a%20wide%20range%20of%20acoustic%20scenes&rft.jtitle=PloS%20one&rft.au=Diehl,%20Peter%20Udo&rft.date=2022-11-28&rft.volume=17&rft.issue=11&rft.spage=e0278170&rft.pages=e0278170-&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0278170&rft_dat=%3Cgale_plos_%3EA728146328%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2740840683&rft_id=info:pmid/36441711&rft_galeid=A728146328&rft_doaj_id=oai_doaj_org_article_d1552afcbfb44ab5bc055257428b4747&rfr_iscdi=true