StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin

We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123
Hauptverfasser: Shah, Neil, Sahipjohn, Neha, Tambrahalli, Vishal, Subramanian, Ramanathan, Gandhi, Vineet
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 21
container_issue 3
container_start_page 1
container_title Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies
container_volume 8
creator Shah, Neil
Sahipjohn, Neha
Tambrahalli, Vishal
Subramanian, Ramanathan
Gandhi, Vineet
description We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.
doi_str_mv 10.1145/3678515
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3678515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3678515</sourcerecordid><originalsourceid>FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</originalsourceid><addsrcrecordid>eNpNkL1PwzAUxC0EElWp2Jm8MQXsOP4IWxVBQarEkDJHrv2CA2kc2WbgvydVCmK6J93vnU6H0DUld5QW_J4JqTjlZ2iRF7LISi7k-b_7Eq1i_CCE0JIxReQC1XWC5Hw9Ahj3gGfFGxgg6NT5Ae9c8F_vDmtc9d3QGd3j-SUaPwJep6SNA4uTx8kBrj-74QpdtLqPsDrpEr09Pe6q52z7unmp1ttMUyZSJjmZOgHbl5qbkhVC5cK0reBgWW6tNrSUe2rp1DaHnBCuqFJMcNOaApSVbIlu51wTfIwB2mYM3UGH74aS5jhHc5pjIm9mUpvDH_Rr_gC3xVkH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><source>ACM Digital Library Complete</source><creator>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</creator><creatorcontrib>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</creatorcontrib><description>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3678515</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Accessibility ; Computing methodologies ; Human-centered computing ; Machine learning</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</cites><orcidid>0000-0001-9441-7074 ; 0009-0009-1101-8701 ; 0009-0008-1396-1106 ; 0000-0001-8861-7731 ; 0000-0002-7517-3673</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3678515$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Sahipjohn, Neha</creatorcontrib><creatorcontrib>Tambrahalli, Vishal</creatorcontrib><creatorcontrib>Subramanian, Ramanathan</creatorcontrib><creatorcontrib>Gandhi, Vineet</creatorcontrib><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</description><subject>Accessibility</subject><subject>Computing methodologies</subject><subject>Human-centered computing</subject><subject>Machine learning</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAUxC0EElWp2Jm8MQXsOP4IWxVBQarEkDJHrv2CA2kc2WbgvydVCmK6J93vnU6H0DUld5QW_J4JqTjlZ2iRF7LISi7k-b_7Eq1i_CCE0JIxReQC1XWC5Hw9Ahj3gGfFGxgg6NT5Ae9c8F_vDmtc9d3QGd3j-SUaPwJep6SNA4uTx8kBrj-74QpdtLqPsDrpEr09Pe6q52z7unmp1ttMUyZSJjmZOgHbl5qbkhVC5cK0reBgWW6tNrSUe2rp1DaHnBCuqFJMcNOaApSVbIlu51wTfIwB2mYM3UGH74aS5jhHc5pjIm9mUpvDH_Rr_gC3xVkH</recordid><startdate>20240909</startdate><enddate>20240909</enddate><creator>Shah, Neil</creator><creator>Sahipjohn, Neha</creator><creator>Tambrahalli, Vishal</creator><creator>Subramanian, Ramanathan</creator><creator>Gandhi, Vineet</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9441-7074</orcidid><orcidid>https://orcid.org/0009-0009-1101-8701</orcidid><orcidid>https://orcid.org/0009-0008-1396-1106</orcidid><orcidid>https://orcid.org/0000-0001-8861-7731</orcidid><orcidid>https://orcid.org/0000-0002-7517-3673</orcidid></search><sort><creationdate>20240909</creationdate><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><author>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accessibility</topic><topic>Computing methodologies</topic><topic>Human-centered computing</topic><topic>Machine learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Sahipjohn, Neha</creatorcontrib><creatorcontrib>Tambrahalli, Vishal</creatorcontrib><creatorcontrib>Subramanian, Ramanathan</creatorcontrib><creatorcontrib>Gandhi, Vineet</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shah, Neil</au><au>Sahipjohn, Neha</au><au>Tambrahalli, Vishal</au><au>Subramanian, Ramanathan</au><au>Gandhi, Vineet</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2024-09-09</date><risdate>2024</risdate><volume>8</volume><issue>3</issue><spage>1</spage><epage>21</epage><pages>1-21</pages><artnum>123</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3678515</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0001-9441-7074</orcidid><orcidid>https://orcid.org/0009-0009-1101-8701</orcidid><orcidid>https://orcid.org/0009-0008-1396-1106</orcidid><orcidid>https://orcid.org/0000-0001-8861-7731</orcidid><orcidid>https://orcid.org/0000-0002-7517-3673</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2474-9567
ispartof Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123
issn 2474-9567
2474-9567
language eng
recordid cdi_crossref_primary_10_1145_3678515
source ACM Digital Library Complete
subjects Accessibility
Computing methodologies
Human-centered computing
Machine learning
title StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A49%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=StethoSpeech:%20Speech%20Generation%20Through%20a%20Clinical%20Stethoscope%20Attached%20to%20the%20Skin&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Shah,%20Neil&rft.date=2024-09-09&rft.volume=8&rft.issue=3&rft.spage=1&rft.epage=21&rft.pages=1-21&rft.artnum=123&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3678515&rft_dat=%3Cacm_cross%3E3678515%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true