Synthesizing speech from Doppler signals

It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that inf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Toth, A R, Kalgaonkar, K, Raj, B, Ezzat, T
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4641
container_issue
container_start_page 4638
container_title
container_volume
creator Toth, A R
Kalgaonkar, K
Raj, B
Ezzat, T
description It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.
doi_str_mv 10.1109/ICASSP.2010.5495552
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495552</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495552</ieee_id><sourcerecordid>5495552</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</originalsourceid><addsrcrecordid>eNpVj01Lw0AURccvMNb-gm6ydJP65s1MZt5SarWFgkIU3JXJ5KWNtGnIdFN_vQG7cXXgXrjcI8REwlRKoMfl7Kko3qcIQ2A0GWPwQozJOqlRa42U55ciQWUpkwRfV_86Q9cikQYhy6WmW3EX4zcAOKtdIh6KU3vccmx-mnaTxo45bNO6P-zT50PX7bhPY7Np_S7ei5t6AI_PHInPl_nHbJGt3l6He6ssoIVjRlrmoWRvSdrKG6UCV-SRPZXOAAdmXdlBACqDBpC8VtaVqHIXQs2mVCMx-dttmHnd9c3e96f1WVr9Avu5Rqg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Synthesizing speech from Doppler signals</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</creator><creatorcontrib>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</creatorcontrib><description>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495552</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustic sensors ; Cameras ; Electromyography ; Frequency ; Loudspeakers ; Radar detection ; Signal synthesis ; Sonar ; Speech synthesis ; Ultrasonic imaging</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4638-4641</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495552$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495552$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Toth, A R</creatorcontrib><creatorcontrib>Kalgaonkar, K</creatorcontrib><creatorcontrib>Raj, B</creatorcontrib><creatorcontrib>Ezzat, T</creatorcontrib><title>Synthesizing speech from Doppler signals</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</description><subject>Acoustic sensors</subject><subject>Cameras</subject><subject>Electromyography</subject><subject>Frequency</subject><subject>Loudspeakers</subject><subject>Radar detection</subject><subject>Signal synthesis</subject><subject>Sonar</subject><subject>Speech synthesis</subject><subject>Ultrasonic imaging</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVj01Lw0AURccvMNb-gm6ydJP65s1MZt5SarWFgkIU3JXJ5KWNtGnIdFN_vQG7cXXgXrjcI8REwlRKoMfl7Kko3qcIQ2A0GWPwQozJOqlRa42U55ciQWUpkwRfV_86Q9cikQYhy6WmW3EX4zcAOKtdIh6KU3vccmx-mnaTxo45bNO6P-zT50PX7bhPY7Np_S7ei5t6AI_PHInPl_nHbJGt3l6He6ssoIVjRlrmoWRvSdrKG6UCV-SRPZXOAAdmXdlBACqDBpC8VtaVqHIXQs2mVCMx-dttmHnd9c3e96f1WVr9Avu5Rqg</recordid><startdate>201003</startdate><enddate>201003</enddate><creator>Toth, A R</creator><creator>Kalgaonkar, K</creator><creator>Raj, B</creator><creator>Ezzat, T</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201003</creationdate><title>Synthesizing speech from Doppler signals</title><author>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Acoustic sensors</topic><topic>Cameras</topic><topic>Electromyography</topic><topic>Frequency</topic><topic>Loudspeakers</topic><topic>Radar detection</topic><topic>Signal synthesis</topic><topic>Sonar</topic><topic>Speech synthesis</topic><topic>Ultrasonic imaging</topic><toplevel>online_resources</toplevel><creatorcontrib>Toth, A R</creatorcontrib><creatorcontrib>Kalgaonkar, K</creatorcontrib><creatorcontrib>Raj, B</creatorcontrib><creatorcontrib>Ezzat, T</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Toth, A R</au><au>Kalgaonkar, K</au><au>Raj, B</au><au>Ezzat, T</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Synthesizing speech from Doppler signals</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-03</date><risdate>2010</risdate><spage>4638</spage><epage>4641</epage><pages>4638-4641</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495552</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4638-4641
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_5495552
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Acoustic sensors
Cameras
Electromyography
Frequency
Loudspeakers
Radar detection
Signal synthesis
Sonar
Speech synthesis
Ultrasonic imaging
title Synthesizing speech from Doppler signals
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T13%3A40%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Synthesizing%20speech%20from%20Doppler%20signals&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Toth,%20A%20R&rft.date=2010-03&rft.spage=4638&rft.epage=4641&rft.pages=4638-4641&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495552&rft_dat=%3Cieee_6IE%3E5495552%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495552&rfr_iscdi=true