Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes

Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2022-11, Vol.17 (11), p.e0278170
Hauptverfasser: Diehl, Peter Udo, Thorbergsson, Leifur, Singer, Yosef, Skripniuk, Vladislav, Pudszuhn, Annett, Hofmann, Veit M, Sprengel, Elias, Meyer-Rachner, Paul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 11
container_start_page e0278170
container_title PloS one
container_volume 17
creator Diehl, Peter Udo
Thorbergsson, Leifur
Singer, Yosef
Skripniuk, Vladislav
Pudszuhn, Annett
Hofmann, Veit M
Sprengel, Elias
Meyer-Rachner, Paul
description Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.
doi_str_mv 10.1371/journal.pone.0278170
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2740840683</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A728146328</galeid><doaj_id>oai_doaj_org_article_d1552afcbfb44ab5bc055257428b4747</doaj_id><sourcerecordid>A728146328</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</originalsourceid><addsrcrecordid>eNqNk8tq3DAUhk1padK0b1BaQaG0C08lWbLkTSGEXgZCA71thSTLtoLHciQ5Td6-mhknjEsWxQuZo-_cfp2TZS8RXKGCoQ-XbvKD7FejG8wKYsYRg4-yY1QVOC8xLB4f_B9lz0K4hJAWvCyfZkdFSQhiCB1nN9_ckNsh-inYawNqY0bQG-kHO7S5ksHUQLvNOEUZrUvpQBiN0R3YmOitDuCPjR3obNvlUuvJS30LpPYuBCDTXW2Al0NrgGuS2U0hWg2CNoMJz7MnjeyDeTGfJ9mvz59-nn3Nzy--rM9Oz3NdVjjmWBEMEeeMKYoZokZXVFOoDNRYMa4VbmipC4pSZ4SUBayg5lzKChGCqyTASfZ6H3fsXRCzaEFgRiAnsORFItZ7onbyUozebqS_FU5asTM43wrpU-G9ETWiFMtGq0YRIhVVGiYDZQRzRRhhKdbHOdukNqZOnUYv-0XQ5c1gO9G6a1ExSCjZlvtuDuDd1WRCFBubBOt7OZik367uCsES0YS--Qd9uLuZamVqwA6NS3n1Nqg4ZZijpBnmiVo9QKWvNhur04Q1NtkXDu8XDomJ5ia2cgpBrH98_3_24veSfXvAdkb2sQuun7bTF5Yg2YO7afOmuRcZQbFdkDs1xHZBxLwgye3V4QPdO91tRPEXZMoKyw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2740840683</pqid></control><display><type>article</type><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Public Library of Science (PLoS)</source><creator>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</creator><creatorcontrib>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</creatorcontrib><description>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0278170</identifier><identifier>PMID: 36441711</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Acoustic properties ; Acoustics ; Algorithms ; Analysis ; Artificial neural networks ; Benchmarking ; Benchmarks ; Biology and Life Sciences ; Computational linguistics ; Computer and Information Sciences ; Computer applications ; Datasets ; Deep Learning ; Engineering and Technology ; Hearing aids ; Humans ; Language processing ; Machine learning ; Natural language interfaces ; Neural networks ; Physical Sciences ; Ratings ; Ratings &amp; rankings ; Social Sciences ; Sound ; Speech ; Video communication ; Videoconferencing</subject><ispartof>PloS one, 2022-11, Vol.17 (11), p.e0278170</ispartof><rights>Copyright: © 2022 Diehl et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Diehl et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Diehl et al 2022 Diehl et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</citedby><cites>FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</cites><orcidid>0000-0001-6683-3011 ; 0000-0002-7316-2444</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9704549/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9704549/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79342,79343</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36441711$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Diehl, Peter Udo</creatorcontrib><creatorcontrib>Thorbergsson, Leifur</creatorcontrib><creatorcontrib>Singer, Yosef</creatorcontrib><creatorcontrib>Skripniuk, Vladislav</creatorcontrib><creatorcontrib>Pudszuhn, Annett</creatorcontrib><creatorcontrib>Hofmann, Veit M</creatorcontrib><creatorcontrib>Sprengel, Elias</creatorcontrib><creatorcontrib>Meyer-Rachner, Paul</creatorcontrib><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</description><subject>Accuracy</subject><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial neural networks</subject><subject>Benchmarking</subject><subject>Benchmarks</subject><subject>Biology and Life Sciences</subject><subject>Computational linguistics</subject><subject>Computer and Information Sciences</subject><subject>Computer applications</subject><subject>Datasets</subject><subject>Deep Learning</subject><subject>Engineering and Technology</subject><subject>Hearing aids</subject><subject>Humans</subject><subject>Language processing</subject><subject>Machine learning</subject><subject>Natural language interfaces</subject><subject>Neural networks</subject><subject>Physical Sciences</subject><subject>Ratings</subject><subject>Ratings &amp; rankings</subject><subject>Social Sciences</subject><subject>Sound</subject><subject>Speech</subject><subject>Video communication</subject><subject>Videoconferencing</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk8tq3DAUhk1padK0b1BaQaG0C08lWbLkTSGEXgZCA71thSTLtoLHciQ5Td6-mhknjEsWxQuZo-_cfp2TZS8RXKGCoQ-XbvKD7FejG8wKYsYRg4-yY1QVOC8xLB4f_B9lz0K4hJAWvCyfZkdFSQhiCB1nN9_ckNsh-inYawNqY0bQG-kHO7S5ksHUQLvNOEUZrUvpQBiN0R3YmOitDuCPjR3obNvlUuvJS30LpPYuBCDTXW2Al0NrgGuS2U0hWg2CNoMJz7MnjeyDeTGfJ9mvz59-nn3Nzy--rM9Oz3NdVjjmWBEMEeeMKYoZokZXVFOoDNRYMa4VbmipC4pSZ4SUBayg5lzKChGCqyTASfZ6H3fsXRCzaEFgRiAnsORFItZ7onbyUozebqS_FU5asTM43wrpU-G9ETWiFMtGq0YRIhVVGiYDZQRzRRhhKdbHOdukNqZOnUYv-0XQ5c1gO9G6a1ExSCjZlvtuDuDd1WRCFBubBOt7OZik367uCsES0YS--Qd9uLuZamVqwA6NS3n1Nqg4ZZijpBnmiVo9QKWvNhur04Q1NtkXDu8XDomJ5ia2cgpBrH98_3_24veSfXvAdkb2sQuun7bTF5Yg2YO7afOmuRcZQbFdkDs1xHZBxLwgye3V4QPdO91tRPEXZMoKyw</recordid><startdate>20221128</startdate><enddate>20221128</enddate><creator>Diehl, Peter Udo</creator><creator>Thorbergsson, Leifur</creator><creator>Singer, Yosef</creator><creator>Skripniuk, Vladislav</creator><creator>Pudszuhn, Annett</creator><creator>Hofmann, Veit M</creator><creator>Sprengel, Elias</creator><creator>Meyer-Rachner, Paul</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6683-3011</orcidid><orcidid>https://orcid.org/0000-0002-7316-2444</orcidid></search><sort><creationdate>20221128</creationdate><title>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</title><author>Diehl, Peter Udo ; Thorbergsson, Leifur ; Singer, Yosef ; Skripniuk, Vladislav ; Pudszuhn, Annett ; Hofmann, Veit M ; Sprengel, Elias ; Meyer-Rachner, Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-2b42018877b52715ec95c50be0c2b78cb2f56c3513864463090c88aa914429193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial neural networks</topic><topic>Benchmarking</topic><topic>Benchmarks</topic><topic>Biology and Life Sciences</topic><topic>Computational linguistics</topic><topic>Computer and Information Sciences</topic><topic>Computer applications</topic><topic>Datasets</topic><topic>Deep Learning</topic><topic>Engineering and Technology</topic><topic>Hearing aids</topic><topic>Humans</topic><topic>Language processing</topic><topic>Machine learning</topic><topic>Natural language interfaces</topic><topic>Neural networks</topic><topic>Physical Sciences</topic><topic>Ratings</topic><topic>Ratings &amp; rankings</topic><topic>Social Sciences</topic><topic>Sound</topic><topic>Speech</topic><topic>Video communication</topic><topic>Videoconferencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Diehl, Peter Udo</creatorcontrib><creatorcontrib>Thorbergsson, Leifur</creatorcontrib><creatorcontrib>Singer, Yosef</creatorcontrib><creatorcontrib>Skripniuk, Vladislav</creatorcontrib><creatorcontrib>Pudszuhn, Annett</creatorcontrib><creatorcontrib>Hofmann, Veit M</creatorcontrib><creatorcontrib>Sprengel, Elias</creatorcontrib><creatorcontrib>Meyer-Rachner, Paul</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Diehl, Peter Udo</au><au>Thorbergsson, Leifur</au><au>Singer, Yosef</au><au>Skripniuk, Vladislav</au><au>Pudszuhn, Annett</au><au>Hofmann, Veit M</au><au>Sprengel, Elias</au><au>Meyer-Rachner, Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2022-11-28</date><risdate>2022</risdate><volume>17</volume><issue>11</issue><spage>e0278170</spage><pages>e0278170-</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>36441711</pmid><doi>10.1371/journal.pone.0278170</doi><tpages>e0278170</tpages><orcidid>https://orcid.org/0000-0001-6683-3011</orcidid><orcidid>https://orcid.org/0000-0002-7316-2444</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2022-11, Vol.17 (11), p.e0278170
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2740840683
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Free Full-Text Journals in Chemistry; Public Library of Science (PLoS)
subjects Accuracy
Acoustic properties
Acoustics
Algorithms
Analysis
Artificial neural networks
Benchmarking
Benchmarks
Biology and Life Sciences
Computational linguistics
Computer and Information Sciences
Computer applications
Datasets
Deep Learning
Engineering and Technology
Hearing aids
Humans
Language processing
Machine learning
Natural language interfaces
Neural networks
Physical Sciences
Ratings
Ratings & rankings
Social Sciences
Sound
Speech
Video communication
Videoconferencing
title Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T12%3A21%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Non-intrusive%20deep%20learning-based%20computational%20speech%20metrics%20with%20high-accuracy%20across%20a%20wide%20range%20of%20acoustic%20scenes&rft.jtitle=PloS%20one&rft.au=Diehl,%20Peter%20Udo&rft.date=2022-11-28&rft.volume=17&rft.issue=11&rft.spage=e0278170&rft.pages=e0278170-&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0278170&rft_dat=%3Cgale_plos_%3EA728146328%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2740840683&rft_id=info:pmid/36441711&rft_galeid=A728146328&rft_doaj_id=oai_doaj_org_article_d1552afcbfb44ab5bc055257428b4747&rfr_iscdi=true