StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin
We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpe...
Gespeichert in:
Veröffentlicht in: | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 21 |
---|---|
container_issue | 3 |
container_start_page | 1 |
container_title | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies |
container_volume | 8 |
creator | Shah, Neil Sahipjohn, Neha Tambrahalli, Vishal Subramanian, Ramanathan Gandhi, Vineet |
description | We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/. |
doi_str_mv | 10.1145/3678515 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3678515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3678515</sourcerecordid><originalsourceid>FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</originalsourceid><addsrcrecordid>eNpNkL1PwzAUxC0EElWp2Jm8MQXsOP4IWxVBQarEkDJHrv2CA2kc2WbgvydVCmK6J93vnU6H0DUld5QW_J4JqTjlZ2iRF7LISi7k-b_7Eq1i_CCE0JIxReQC1XWC5Hw9Ahj3gGfFGxgg6NT5Ae9c8F_vDmtc9d3QGd3j-SUaPwJep6SNA4uTx8kBrj-74QpdtLqPsDrpEr09Pe6q52z7unmp1ttMUyZSJjmZOgHbl5qbkhVC5cK0reBgWW6tNrSUe2rp1DaHnBCuqFJMcNOaApSVbIlu51wTfIwB2mYM3UGH74aS5jhHc5pjIm9mUpvDH_Rr_gC3xVkH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><source>ACM Digital Library Complete</source><creator>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</creator><creatorcontrib>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</creatorcontrib><description>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3678515</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Accessibility ; Computing methodologies ; Human-centered computing ; Machine learning</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</cites><orcidid>0000-0001-9441-7074 ; 0009-0009-1101-8701 ; 0009-0008-1396-1106 ; 0000-0001-8861-7731 ; 0000-0002-7517-3673</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3678515$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Sahipjohn, Neha</creatorcontrib><creatorcontrib>Tambrahalli, Vishal</creatorcontrib><creatorcontrib>Subramanian, Ramanathan</creatorcontrib><creatorcontrib>Gandhi, Vineet</creatorcontrib><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</description><subject>Accessibility</subject><subject>Computing methodologies</subject><subject>Human-centered computing</subject><subject>Machine learning</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAUxC0EElWp2Jm8MQXsOP4IWxVBQarEkDJHrv2CA2kc2WbgvydVCmK6J93vnU6H0DUld5QW_J4JqTjlZ2iRF7LISi7k-b_7Eq1i_CCE0JIxReQC1XWC5Hw9Ahj3gGfFGxgg6NT5Ae9c8F_vDmtc9d3QGd3j-SUaPwJep6SNA4uTx8kBrj-74QpdtLqPsDrpEr09Pe6q52z7unmp1ttMUyZSJjmZOgHbl5qbkhVC5cK0reBgWW6tNrSUe2rp1DaHnBCuqFJMcNOaApSVbIlu51wTfIwB2mYM3UGH74aS5jhHc5pjIm9mUpvDH_Rr_gC3xVkH</recordid><startdate>20240909</startdate><enddate>20240909</enddate><creator>Shah, Neil</creator><creator>Sahipjohn, Neha</creator><creator>Tambrahalli, Vishal</creator><creator>Subramanian, Ramanathan</creator><creator>Gandhi, Vineet</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9441-7074</orcidid><orcidid>https://orcid.org/0009-0009-1101-8701</orcidid><orcidid>https://orcid.org/0009-0008-1396-1106</orcidid><orcidid>https://orcid.org/0000-0001-8861-7731</orcidid><orcidid>https://orcid.org/0000-0002-7517-3673</orcidid></search><sort><creationdate>20240909</creationdate><title>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</title><author>Shah, Neil ; Sahipjohn, Neha ; Tambrahalli, Vishal ; Subramanian, Ramanathan ; Gandhi, Vineet</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a136t-750956e3b9a5c9346826cff65ed32ddac197b1d10012e20058188365cfc4e8d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accessibility</topic><topic>Computing methodologies</topic><topic>Human-centered computing</topic><topic>Machine learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Sahipjohn, Neha</creatorcontrib><creatorcontrib>Tambrahalli, Vishal</creatorcontrib><creatorcontrib>Subramanian, Ramanathan</creatorcontrib><creatorcontrib>Gandhi, Vineet</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shah, Neil</au><au>Sahipjohn, Neha</au><au>Tambrahalli, Vishal</au><au>Subramanian, Ramanathan</au><au>Gandhi, Vineet</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2024-09-09</date><risdate>2024</risdate><volume>8</volume><issue>3</issue><spage>1</spage><epage>21</epage><pages>1-21</pages><artnum>123</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3678515</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0001-9441-7074</orcidid><orcidid>https://orcid.org/0009-0009-1101-8701</orcidid><orcidid>https://orcid.org/0009-0008-1396-1106</orcidid><orcidid>https://orcid.org/0000-0001-8861-7731</orcidid><orcidid>https://orcid.org/0000-0002-7517-3673</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2474-9567 |
ispartof | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-09, Vol.8 (3), p.1-21, Article 123 |
issn | 2474-9567 2474-9567 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3678515 |
source | ACM Digital Library Complete |
subjects | Accessibility Computing methodologies Human-centered computing Machine learning |
title | StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A49%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=StethoSpeech:%20Speech%20Generation%20Through%20a%20Clinical%20Stethoscope%20Attached%20to%20the%20Skin&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Shah,%20Neil&rft.date=2024-09-09&rft.volume=8&rft.issue=3&rft.spage=1&rft.epage=21&rft.pages=1-21&rft.artnum=123&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3678515&rft_dat=%3Cacm_cross%3E3678515%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |