Synthesizing speech from Doppler signals
It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that inf...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4641 |
---|---|
container_issue | |
container_start_page | 4638 |
container_title | |
container_volume | |
creator | Toth, A R Kalgaonkar, K Raj, B Ezzat, T |
description | It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs. |
doi_str_mv | 10.1109/ICASSP.2010.5495552 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495552</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495552</ieee_id><sourcerecordid>5495552</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</originalsourceid><addsrcrecordid>eNpVj01Lw0AURccvMNb-gm6ydJP65s1MZt5SarWFgkIU3JXJ5KWNtGnIdFN_vQG7cXXgXrjcI8REwlRKoMfl7Kko3qcIQ2A0GWPwQozJOqlRa42U55ciQWUpkwRfV_86Q9cikQYhy6WmW3EX4zcAOKtdIh6KU3vccmx-mnaTxo45bNO6P-zT50PX7bhPY7Np_S7ei5t6AI_PHInPl_nHbJGt3l6He6ssoIVjRlrmoWRvSdrKG6UCV-SRPZXOAAdmXdlBACqDBpC8VtaVqHIXQs2mVCMx-dttmHnd9c3e96f1WVr9Avu5Rqg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Synthesizing speech from Doppler signals</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</creator><creatorcontrib>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</creatorcontrib><description>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495552</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustic sensors ; Cameras ; Electromyography ; Frequency ; Loudspeakers ; Radar detection ; Signal synthesis ; Sonar ; Speech synthesis ; Ultrasonic imaging</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4638-4641</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495552$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495552$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Toth, A R</creatorcontrib><creatorcontrib>Kalgaonkar, K</creatorcontrib><creatorcontrib>Raj, B</creatorcontrib><creatorcontrib>Ezzat, T</creatorcontrib><title>Synthesizing speech from Doppler signals</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</description><subject>Acoustic sensors</subject><subject>Cameras</subject><subject>Electromyography</subject><subject>Frequency</subject><subject>Loudspeakers</subject><subject>Radar detection</subject><subject>Signal synthesis</subject><subject>Sonar</subject><subject>Speech synthesis</subject><subject>Ultrasonic imaging</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVj01Lw0AURccvMNb-gm6ydJP65s1MZt5SarWFgkIU3JXJ5KWNtGnIdFN_vQG7cXXgXrjcI8REwlRKoMfl7Kko3qcIQ2A0GWPwQozJOqlRa42U55ciQWUpkwRfV_86Q9cikQYhy6WmW3EX4zcAOKtdIh6KU3vccmx-mnaTxo45bNO6P-zT50PX7bhPY7Np_S7ei5t6AI_PHInPl_nHbJGt3l6He6ssoIVjRlrmoWRvSdrKG6UCV-SRPZXOAAdmXdlBACqDBpC8VtaVqHIXQs2mVCMx-dttmHnd9c3e96f1WVr9Avu5Rqg</recordid><startdate>201003</startdate><enddate>201003</enddate><creator>Toth, A R</creator><creator>Kalgaonkar, K</creator><creator>Raj, B</creator><creator>Ezzat, T</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201003</creationdate><title>Synthesizing speech from Doppler signals</title><author>Toth, A R ; Kalgaonkar, K ; Raj, B ; Ezzat, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-9416cbea7917da533ced9a2ea9b850ecee4d79550d525029a4378b2368ccfe5b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Acoustic sensors</topic><topic>Cameras</topic><topic>Electromyography</topic><topic>Frequency</topic><topic>Loudspeakers</topic><topic>Radar detection</topic><topic>Signal synthesis</topic><topic>Sonar</topic><topic>Speech synthesis</topic><topic>Ultrasonic imaging</topic><toplevel>online_resources</toplevel><creatorcontrib>Toth, A R</creatorcontrib><creatorcontrib>Kalgaonkar, K</creatorcontrib><creatorcontrib>Raj, B</creatorcontrib><creatorcontrib>Ezzat, T</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Toth, A R</au><au>Kalgaonkar, K</au><au>Raj, B</au><au>Ezzat, T</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Synthesizing speech from Doppler signals</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-03</date><risdate>2010</risdate><spage>4638</spage><epage>4641</epage><pages>4638-4641</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495552</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4638-4641 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_5495552 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Acoustic sensors Cameras Electromyography Frequency Loudspeakers Radar detection Signal synthesis Sonar Speech synthesis Ultrasonic imaging |
title | Synthesizing speech from Doppler signals |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T13%3A40%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Synthesizing%20speech%20from%20Doppler%20signals&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Toth,%20A%20R&rft.date=2010-03&rft.spage=4638&rft.epage=4641&rft.pages=4638-4641&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495552&rft_dat=%3Cieee_6IE%3E5495552%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495552&rfr_iscdi=true |