Multimodal Segmentation for Vocal Tract Modeling
Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-06 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Jain, Rishi Bohan, Yu Wu, Peter Prabhune, Tejas Anumanchipalli, Gopala |
description | Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072055503</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072055503</sourcerecordid><originalsourceid>FETCH-proquest_journals_30720555033</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw8C3NKcnMzU9JzFEITk3PTc0rSSzJzM9TSMsvUgjLTwYKhxQlJpco-OanpOZk5qXzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8sYG5kYGpqamBsTFxqgBuFDMA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072055503</pqid></control><display><type>article</type><title>Multimodal Segmentation for Vocal Tract Modeling</title><source>Freely Accessible Journals</source><creator>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</creator><creatorcontrib>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</creatorcontrib><description>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Avatars ; Datasets ; Image segmentation ; Labeling ; Labels ; Linguistics ; Magnetic resonance imaging ; Modelling ; Motion capture ; Speech processing ; Vocal tract</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>782,786</link.rule.ids></links><search><creatorcontrib>Jain, Rishi</creatorcontrib><creatorcontrib>Bohan, Yu</creatorcontrib><creatorcontrib>Wu, Peter</creatorcontrib><creatorcontrib>Prabhune, Tejas</creatorcontrib><creatorcontrib>Anumanchipalli, Gopala</creatorcontrib><title>Multimodal Segmentation for Vocal Tract Modeling</title><title>arXiv.org</title><description>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</description><subject>Algorithms</subject><subject>Avatars</subject><subject>Datasets</subject><subject>Image segmentation</subject><subject>Labeling</subject><subject>Labels</subject><subject>Linguistics</subject><subject>Magnetic resonance imaging</subject><subject>Modelling</subject><subject>Motion capture</subject><subject>Speech processing</subject><subject>Vocal tract</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw8C3NKcnMzU9JzFEITk3PTc0rSSzJzM9TSMsvUgjLTwYKhxQlJpco-OanpOZk5qXzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8sYG5kYGpqamBsTFxqgBuFDMA</recordid><startdate>20240622</startdate><enddate>20240622</enddate><creator>Jain, Rishi</creator><creator>Bohan, Yu</creator><creator>Wu, Peter</creator><creator>Prabhune, Tejas</creator><creator>Anumanchipalli, Gopala</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240622</creationdate><title>Multimodal Segmentation for Vocal Tract Modeling</title><author>Jain, Rishi ; Bohan, Yu ; Wu, Peter ; Prabhune, Tejas ; Anumanchipalli, Gopala</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30720555033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Avatars</topic><topic>Datasets</topic><topic>Image segmentation</topic><topic>Labeling</topic><topic>Labels</topic><topic>Linguistics</topic><topic>Magnetic resonance imaging</topic><topic>Modelling</topic><topic>Motion capture</topic><topic>Speech processing</topic><topic>Vocal tract</topic><toplevel>online_resources</toplevel><creatorcontrib>Jain, Rishi</creatorcontrib><creatorcontrib>Bohan, Yu</creatorcontrib><creatorcontrib>Wu, Peter</creatorcontrib><creatorcontrib>Prabhune, Tejas</creatorcontrib><creatorcontrib>Anumanchipalli, Gopala</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jain, Rishi</au><au>Bohan, Yu</au><au>Wu, Peter</au><au>Prabhune, Tejas</au><au>Anumanchipalli, Gopala</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multimodal Segmentation for Vocal Tract Modeling</atitle><jtitle>arXiv.org</jtitle><date>2024-06-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3072055503 |
source | Freely Accessible Journals |
subjects | Algorithms Avatars Datasets Image segmentation Labeling Labels Linguistics Magnetic resonance imaging Modelling Motion capture Speech processing Vocal tract |
title | Multimodal Segmentation for Vocal Tract Modeling |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-11-29T10%3A45%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multimodal%20Segmentation%20for%20Vocal%20Tract%20Modeling&rft.jtitle=arXiv.org&rft.au=Jain,%20Rishi&rft.date=2024-06-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072055503%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072055503&rft_id=info:pmid/&rfr_iscdi=true |