Statistical analysis of the autoregressive modeling of reverberant speech
Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room...
Gespeichert in:
Veröffentlicht in: | The Journal of the Acoustical Society of America 2006-12, Vol.120 (6), p.4031-4039 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4039 |
---|---|
container_issue | 6 |
container_start_page | 4031 |
container_title | The Journal of the Acoustical Society of America |
container_volume | 120 |
creator | Gaubitch, Nikolay D. Ward, Darren B. Naylor, Patrick A. |
description | Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an
M
-channel observation
(
M
>
1
)
; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the
M
-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced
(
<
0.3
m
)
. |
doi_str_mv | 10.1121/1.2356840 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85660095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68295369</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</originalsourceid><addsrcrecordid>eNqF0U1Lw0AQBuBFFFurB_-A5KLgIXU_spvdiyDFj0LBg3oOm82kXUmTupMW-u9NaaRexNMy7MMMvC8hl4yOGePsjo25kEon9IgMmeQ01pInx2RIKWVxYpQakDPEz26UWphTMmAp5zLhZkimb61tPbbe2Sqyta226DFqyqhdQGTXbRNgHgDRbyBaNgVUvp7vvgNsIOQQbN1GuAJwi3NyUtoK4aJ_R-Tj6fF98hLPXp-nk4dZ7ARL2rjQJVcq5yLRpVBGMy546phkIley4NRoZ0qRS5Xktsh1KgQYyAsmLEtLrZUYkZv93lVovtaAbbb06KCqbA3NGjMtlaLUyH-h0rxTynTwdg9daBADlNkq-KUN24zRbBdwxrI-4M5e9UvX-RKKg-wT7cB1Dyx2mZZdQs7jwWmhUqrSzt3vHTq_q6Cp_776q6PspyPxDf6Tlx0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>68295369</pqid></control><display><type>article</type><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><source>MEDLINE</source><source>AIP Journals Complete</source><source>AIP Acoustical Society of America</source><creator>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</creator><creatorcontrib>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</creatorcontrib><description>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an
M
-channel observation
(
M
>
1
)
; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the
M
-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced
(
<
0.3
m
)
.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.2356840</identifier><identifier>PMID: 17225429</identifier><identifier>CODEN: JASMAN</identifier><language>eng</language><publisher>Woodbury, NY: Acoustical Society of America</publisher><subject>Acoustic signal processing ; Acoustics ; Architectural acoustics ; Exact sciences and technology ; Fundamental areas of phenomenology (including applications) ; Humans ; Models, Biological ; Physics ; Speech Perception ; Speech Production Measurement</subject><ispartof>The Journal of the Acoustical Society of America, 2006-12, Vol.120 (6), p.4031-4039</ispartof><rights>2006 Acoustical Society of America</rights><rights>2007 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</citedby><cites>FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubs.aip.org/jasa/article-lookup/doi/10.1121/1.2356840$$EHTML$$P50$$Gscitation$$H</linktohtml><link.rule.ids>207,208,314,776,780,790,1559,4498,27901,27902,76127</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18367067$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17225429$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gaubitch, Nikolay D.</creatorcontrib><creatorcontrib>Ward, Darren B.</creatorcontrib><creatorcontrib>Naylor, Patrick A.</creatorcontrib><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><title>The Journal of the Acoustical Society of America</title><addtitle>J Acoust Soc Am</addtitle><description>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an
M
-channel observation
(
M
>
1
)
; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the
M
-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced
(
<
0.3
m
)
.</description><subject>Acoustic signal processing</subject><subject>Acoustics</subject><subject>Architectural acoustics</subject><subject>Exact sciences and technology</subject><subject>Fundamental areas of phenomenology (including applications)</subject><subject>Humans</subject><subject>Models, Biological</subject><subject>Physics</subject><subject>Speech Perception</subject><subject>Speech Production Measurement</subject><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqF0U1Lw0AQBuBFFFurB_-A5KLgIXU_spvdiyDFj0LBg3oOm82kXUmTupMW-u9NaaRexNMy7MMMvC8hl4yOGePsjo25kEon9IgMmeQ01pInx2RIKWVxYpQakDPEz26UWphTMmAp5zLhZkimb61tPbbe2Sqyta226DFqyqhdQGTXbRNgHgDRbyBaNgVUvp7vvgNsIOQQbN1GuAJwi3NyUtoK4aJ_R-Tj6fF98hLPXp-nk4dZ7ARL2rjQJVcq5yLRpVBGMy546phkIley4NRoZ0qRS5Xktsh1KgQYyAsmLEtLrZUYkZv93lVovtaAbbb06KCqbA3NGjMtlaLUyH-h0rxTynTwdg9daBADlNkq-KUN24zRbBdwxrI-4M5e9UvX-RKKg-wT7cB1Dyx2mZZdQs7jwWmhUqrSzt3vHTq_q6Cp_776q6PspyPxDf6Tlx0</recordid><startdate>200612</startdate><enddate>200612</enddate><creator>Gaubitch, Nikolay D.</creator><creator>Ward, Darren B.</creator><creator>Naylor, Patrick A.</creator><general>Acoustical Society of America</general><general>American Institute of Physics</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>200612</creationdate><title>Statistical analysis of the autoregressive modeling of reverberant speech</title><author>Gaubitch, Nikolay D. ; Ward, Darren B. ; Naylor, Patrick A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-d8f266b2348f369812327c1513b65d2098c9f3b564badb8733e9ebd13a17f8863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Acoustic signal processing</topic><topic>Acoustics</topic><topic>Architectural acoustics</topic><topic>Exact sciences and technology</topic><topic>Fundamental areas of phenomenology (including applications)</topic><topic>Humans</topic><topic>Models, Biological</topic><topic>Physics</topic><topic>Speech Perception</topic><topic>Speech Production Measurement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gaubitch, Nikolay D.</creatorcontrib><creatorcontrib>Ward, Darren B.</creatorcontrib><creatorcontrib>Naylor, Patrick A.</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gaubitch, Nikolay D.</au><au>Ward, Darren B.</au><au>Naylor, Patrick A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Statistical analysis of the autoregressive modeling of reverberant speech</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><addtitle>J Acoust Soc Am</addtitle><date>2006-12</date><risdate>2006</risdate><volume>120</volume><issue>6</issue><spage>4031</spage><epage>4039</epage><pages>4031-4039</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><coden>JASMAN</coden><abstract>Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an
M
-channel observation
(
M
>
1
)
; and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the
M
-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced
(
<
0.3
m
)
.</abstract><cop>Woodbury, NY</cop><pub>Acoustical Society of America</pub><pmid>17225429</pmid><doi>10.1121/1.2356840</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0001-4966 |
ispartof | The Journal of the Acoustical Society of America, 2006-12, Vol.120 (6), p.4031-4039 |
issn | 0001-4966 1520-8524 |
language | eng |
recordid | cdi_proquest_miscellaneous_85660095 |
source | MEDLINE; AIP Journals Complete; AIP Acoustical Society of America |
subjects | Acoustic signal processing Acoustics Architectural acoustics Exact sciences and technology Fundamental areas of phenomenology (including applications) Humans Models, Biological Physics Speech Perception Speech Production Measurement |
title | Statistical analysis of the autoregressive modeling of reverberant speech |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A54%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Statistical%20analysis%20of%20the%20autoregressive%20modeling%20of%20reverberant%20speech&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=Gaubitch,%20Nikolay%20D.&rft.date=2006-12&rft.volume=120&rft.issue=6&rft.spage=4031&rft.epage=4039&rft.pages=4031-4039&rft.issn=0001-4966&rft.eissn=1520-8524&rft.coden=JASMAN&rft_id=info:doi/10.1121/1.2356840&rft_dat=%3Cproquest_cross%3E68295369%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=68295369&rft_id=info:pmid/17225429&rfr_iscdi=true |