Multimodal Segmentation for Vocal Tract Modeling

Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-06
Hauptverfasser: Jain, Rishi, Bohan, Yu, Wu, Peter, Prabhune, Tejas, Anumanchipalli, Gopala
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Jain, Rishi
Bohan, Yu
Wu, Peter
Prabhune, Tejas
Anumanchipalli, Gopala
description Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072055503</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072055503</sourcerecordid><originalsourceid>FETCH-proquest_journals_30720555033</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw8C3NKcnMzU9JzFEITk3PTc0rSSzJzM9TSMsvUgjLTwYKhxQlJpco-OanpOZk5qXzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8sYG5kYGpqamBsTFxqgBuFDMA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072055503</pqid></control><display><type>article</type><title>Multimodal Segmentation for Vocal Tract Modeling</title><source>Freely Accessible Journals</source><creator>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</creator><creatorcontrib>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</creatorcontrib><description>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Avatars ; Datasets ; Image segmentation ; Labeling ; Labels ; Linguistics ; Magnetic resonance imaging ; Modelling ; Motion capture ; Speech processing ; Vocal tract</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>782,786</link.rule.ids></links><search><creatorcontrib>Jain, Rishi</creatorcontrib><creatorcontrib>Bohan, Yu</creatorcontrib><creatorcontrib>Wu, Peter</creatorcontrib><creatorcontrib>Prabhune, Tejas</creatorcontrib><creatorcontrib>Anumanchipalli, Gopala</creatorcontrib><title>Multimodal Segmentation for Vocal Tract Modeling</title><title>arXiv.org</title><description>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</description><subject>Algorithms</subject><subject>Avatars</subject><subject>Datasets</subject><subject>Image segmentation</subject><subject>Labeling</subject><subject>Labels</subject><subject>Linguistics</subject><subject>Magnetic resonance imaging</subject><subject>Modelling</subject><subject>Motion capture</subject><subject>Speech processing</subject><subject>Vocal tract</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw8C3NKcnMzU9JzFEITk3PTc0rSSzJzM9TSMsvUgjLTwYKhxQlJpco-OanpOZk5qXzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8sYG5kYGpqamBsTFxqgBuFDMA</recordid><startdate>20240622</startdate><enddate>20240622</enddate><creator>Jain, Rishi</creator><creator>Bohan, Yu</creator><creator>Wu, Peter</creator><creator>Prabhune, Tejas</creator><creator>Anumanchipalli, Gopala</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240622</creationdate><title>Multimodal Segmentation for Vocal Tract Modeling</title><author>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30720555033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Avatars</topic><topic>Datasets</topic><topic>Image segmentation</topic><topic>Labeling</topic><topic>Labels</topic><topic>Linguistics</topic><topic>Magnetic resonance imaging</topic><topic>Modelling</topic><topic>Motion capture</topic><topic>Speech processing</topic><topic>Vocal tract</topic><toplevel>online_resources</toplevel><creatorcontrib>Jain, Rishi</creatorcontrib><creatorcontrib>Bohan, Yu</creatorcontrib><creatorcontrib>Wu, Peter</creatorcontrib><creatorcontrib>Prabhune, Tejas</creatorcontrib><creatorcontrib>Anumanchipalli, Gopala</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jain, Rishi</au><au>Bohan, Yu</au><au>Wu, Peter</au><au>Prabhune, Tejas</au><au>Anumanchipalli, Gopala</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multimodal Segmentation for Vocal Tract Modeling</atitle><jtitle>arXiv.org</jtitle><date>2024-06-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-06
issn 2331-8422
language eng
recordid cdi_proquest_journals_3072055503
source Freely Accessible Journals
subjects Algorithms
Avatars
Datasets
Image segmentation
Labeling
Labels
Linguistics
Magnetic resonance imaging
Modelling
Motion capture
Speech processing
Vocal tract
title Multimodal Segmentation for Vocal Tract Modeling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-11-29T10%3A45%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multimodal%20Segmentation%20for%20Vocal%20Tract%20Modeling&rft.jtitle=arXiv.org&rft.au=Jain,%20Rishi&rft.date=2024-06-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072055503%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072055503&rft_id=info:pmid/&rfr_iscdi=true