Statistical analysis of the autoregressive modeling of reverberant speech

Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2006-12, Vol.120 (6), p.4031-4039
Hauptverfasser: Gaubitch, Nikolay D., Ward, Darren B., Naylor, Patrick A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4039
container_issue 6
container_start_page 4031
container_title The Journal of the Acoustical Society of America
container_volume 120
creator Gaubitch, Nikolay D.
Ward, Darren B.
Naylor, Patrick A.
description Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M -channel observation ( M > 1 ) ; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M -channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced ( < 0.3 m ) .
doi_str_mv 10.1121/1.2356840
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85660095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68295369</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</originalsourceid><addsrcrecordid>eNqF0U1Lw0AQBuBFFFurB_-A5KLgIXU_spvdiyDFj0LBg3oOm82kXUmTupMW-u9NaaRexNMy7MMMvC8hl4yOGePsjo25kEon9IgMmeQ01pInx2RIKWVxYpQakDPEz26UWphTMmAp5zLhZkimb61tPbbe2Sqyta226DFqyqhdQGTXbRNgHgDRbyBaNgVUvp7vvgNsIOQQbN1GuAJwi3NyUtoK4aJ_R-Tj6fF98hLPXp-nk4dZ7ARL2rjQJVcq5yLRpVBGMy546phkIley4NRoZ0qRS5Xktsh1KgQYyAsmLEtLrZUYkZv93lVovtaAbbb06KCqbA3NGjMtlaLUyH-h0rxTynTwdg9daBADlNkq-KUN24zRbBdwxrI-4M5e9UvX-RKKg-wT7cB1Dyx2mZZdQs7jwWmhUqrSzt3vHTq_q6Cp_776q6PspyPxDf6Tlx0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>68295369</pqid></control><display><type>article</type><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><source>MEDLINE</source><source>AIP Journals Complete</source><source>AIP Acoustical Society of America</source><creator>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</creator><creatorcontrib>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</creatorcontrib><description>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M -channel observation ( M &gt; 1 ) ; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M -channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced ( &lt; 0.3 m ) .</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.2356840</identifier><identifier>PMID: 17225429</identifier><identifier>CODEN: JASMAN</identifier><language>eng</language><publisher>Woodbury, NY: Acoustical Society of America</publisher><subject>Acoustic signal processing ; Acoustics ; Architectural acoustics ; Exact sciences and technology ; Fundamental areas of phenomenology (including applications) ; Humans ; Models, Biological ; Physics ; Speech Perception ; Speech Production Measurement</subject><ispartof>The Journal of the Acoustical Society of America, 2006-12, Vol.120 (6), p.4031-4039</ispartof><rights>2006 Acoustical Society of America</rights><rights>2007 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</citedby><cites>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubs.aip.org/jasa/article-lookup/doi/10.1121/1.2356840$$EHTML$$P50$$Gscitation$$H</linktohtml><link.rule.ids>207,208,314,776,780,790,1559,4498,27901,27902,76127</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18367067$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17225429$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gaubitch, Nikolay D.</creatorcontrib><creatorcontrib>Ward, Darren B.</creatorcontrib><creatorcontrib>Naylor, Patrick A.</creatorcontrib><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><title>The Journal of the Acoustical Society of America</title><addtitle>J Acoust Soc Am</addtitle><description>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M -channel observation ( M &gt; 1 ) ; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M -channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced ( &lt; 0.3 m ) .</description><subject>Acoustic signal processing</subject><subject>Acoustics</subject><subject>Architectural acoustics</subject><subject>Exact sciences and technology</subject><subject>Fundamental areas of phenomenology (including applications)</subject><subject>Humans</subject><subject>Models, Biological</subject><subject>Physics</subject><subject>Speech Perception</subject><subject>Speech Production Measurement</subject><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqF0U1Lw0AQBuBFFFurB_-A5KLgIXU_spvdiyDFj0LBg3oOm82kXUmTupMW-u9NaaRexNMy7MMMvC8hl4yOGePsjo25kEon9IgMmeQ01pInx2RIKWVxYpQakDPEz26UWphTMmAp5zLhZkimb61tPbbe2Sqyta226DFqyqhdQGTXbRNgHgDRbyBaNgVUvp7vvgNsIOQQbN1GuAJwi3NyUtoK4aJ_R-Tj6fF98hLPXp-nk4dZ7ARL2rjQJVcq5yLRpVBGMy546phkIley4NRoZ0qRS5Xktsh1KgQYyAsmLEtLrZUYkZv93lVovtaAbbb06KCqbA3NGjMtlaLUyH-h0rxTynTwdg9daBADlNkq-KUN24zRbBdwxrI-4M5e9UvX-RKKg-wT7cB1Dyx2mZZdQs7jwWmhUqrSzt3vHTq_q6Cp_776q6PspyPxDf6Tlx0</recordid><startdate>200612</startdate><enddate>200612</enddate><creator>Gaubitch, Nikolay D.</creator><creator>Ward, Darren B.</creator><creator>Naylor, Patrick A.</creator><general>Acoustical Society of America</general><general>American Institute of Physics</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>200612</creationdate><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><author>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Acoustic signal processing</topic><topic>Acoustics</topic><topic>Architectural acoustics</topic><topic>Exact sciences and technology</topic><topic>Fundamental areas of phenomenology (including applications)</topic><topic>Humans</topic><topic>Models, Biological</topic><topic>Physics</topic><topic>Speech Perception</topic><topic>Speech Production Measurement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gaubitch, Nikolay D.</creatorcontrib><creatorcontrib>Ward, Darren B.</creatorcontrib><creatorcontrib>Naylor, Patrick A.</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gaubitch, Nikolay D.</au><au>Ward, Darren B.</au><au>Naylor, Patrick A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Statistical analysis of the autoregressive modeling of reverberant speech</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><addtitle>J Acoust Soc Am</addtitle><date>2006-12</date><risdate>2006</risdate><volume>120</volume><issue>6</issue><spage>4031</spage><epage>4039</epage><pages>4031-4039</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><coden>JASMAN</coden><abstract>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M -channel observation ( M &gt; 1 ) ; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M -channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced ( &lt; 0.3 m ) .</abstract><cop>Woodbury, NY</cop><pub>Acoustical Society of America</pub><pmid>17225429</pmid><doi>10.1121/1.2356840</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0001-4966
ispartof The Journal of the Acoustical Society of America, 2006-12, Vol.120 (6), p.4031-4039
issn 0001-4966
1520-8524
language eng
recordid cdi_proquest_miscellaneous_85660095
source MEDLINE; AIP Journals Complete; AIP Acoustical Society of America
subjects Acoustic signal processing
Acoustics
Architectural acoustics
Exact sciences and technology
Fundamental areas of phenomenology (including applications)
Humans
Models, Biological
Physics
Speech Perception
Speech Production Measurement
title Statistical analysis of the autoregressive modeling of reverberant speech
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A54%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Statistical%20analysis%20of%20the%20autoregressive%20modeling%20of%20reverberant%20speech&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Gaubitch,%20Nikolay%20D.&rft.date=2006-12&rft.volume=120&rft.issue=6&rft.spage=4031&rft.epage=4039&rft.pages=4031-4039&rft.issn=0001-4966&rft.eissn=1520-8524&rft.coden=JASMAN&rft_id=info:doi/10.1121/1.2356840&rft_dat=%3Cproquest_cross%3E68295369%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=68295369&rft_id=info:pmid/17225429&rfr_iscdi=true