Unsupervised visualization of Under-resourced speech prosody
In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology ado...
Gespeichert in:
Veröffentlicht in: | Speech communication 2018-07, Vol.101, p.45-56 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 56 |
---|---|
container_issue | |
container_start_page | 45 |
container_title | Speech communication |
container_volume | 101 |
creator | Ekpenyong, Moses Inyang, Udoinyang Udoh, EmemObong |
description | In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech. |
doi_str_mv | 10.1016/j.specom.2018.04.011 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2100379782</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S016763931730225X</els_id><sourcerecordid>2100379782</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</originalsourceid><addsrcrecordid>eNp9UEtLw0AQXkTBWv0HHgKeE2cf3WxABCm-oODFnpdkM8ENbRJ3k0L99U6JZw8zc_m--R6M3XLIOHB932ZxQNfvMwHcZKAy4PyMLbjJRZpzI87ZgmB5qmUhL9lVjC0AKGPEgj1suzgNGA4-Yp3Qnsqd_ylH33dJ3yTbrsaQBoz9FBwBSAfdVzKEPvb18ZpdNOUu4s3fXbLty_Pn-i3dfLy-r582qZNSjSnXWuQVgNCooERErZuqMrzWkktd1XqFNTg0RdEIowBpnFzJpoJypRR5XrK7-S_pfk8YR9uSn44kreAAMi9yIwilZpQjdzFgY4fg92U4Wg721JNt7dyTPfVkQVnqiWiPMw0pwcFjsNF57CitD-hGW_f-_we_PqNy2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2100379782</pqid></control><display><type>article</type><title>Unsupervised visualization of Under-resourced speech prosody</title><source>Access via ScienceDirect (Elsevier)</source><creator>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</creator><creatorcontrib>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</creatorcontrib><description>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2018.04.011</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Artificial intelligence ; Clusters ; Coastal zone ; Corpus linguistics ; English language ; Feature extraction ; Fundamental frequency ; Ibibio-Efik languages ; Linguistics ; Machine learning ; Pattern analysis ; Phonemes ; Prosodic features ; Resonant frequencies ; Self organizing maps ; Self-organizing map ; Speech ; Speech duration ; Speech prosody ; Syllables ; Tone ; Tone modeling ; Translation ; Visualization ; Vowels</subject><ispartof>Speech communication, 2018-07, Vol.101, p.45-56</ispartof><rights>2018 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jul 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</citedby><cites>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</cites><orcidid>0000-0001-6774-5259</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2018.04.011$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Ekpenyong, Moses</creatorcontrib><creatorcontrib>Inyang, Udoinyang</creatorcontrib><creatorcontrib>Udoh, EmemObong</creatorcontrib><title>Unsupervised visualization of Under-resourced speech prosody</title><title>Speech communication</title><description>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</description><subject>Artificial intelligence</subject><subject>Clusters</subject><subject>Coastal zone</subject><subject>Corpus linguistics</subject><subject>English language</subject><subject>Feature extraction</subject><subject>Fundamental frequency</subject><subject>Ibibio-Efik languages</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Pattern analysis</subject><subject>Phonemes</subject><subject>Prosodic features</subject><subject>Resonant frequencies</subject><subject>Self organizing maps</subject><subject>Self-organizing map</subject><subject>Speech</subject><subject>Speech duration</subject><subject>Speech prosody</subject><subject>Syllables</subject><subject>Tone</subject><subject>Tone modeling</subject><subject>Translation</subject><subject>Visualization</subject><subject>Vowels</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp9UEtLw0AQXkTBWv0HHgKeE2cf3WxABCm-oODFnpdkM8ENbRJ3k0L99U6JZw8zc_m--R6M3XLIOHB932ZxQNfvMwHcZKAy4PyMLbjJRZpzI87ZgmB5qmUhL9lVjC0AKGPEgj1suzgNGA4-Yp3Qnsqd_ylH33dJ3yTbrsaQBoz9FBwBSAfdVzKEPvb18ZpdNOUu4s3fXbLty_Pn-i3dfLy-r582qZNSjSnXWuQVgNCooERErZuqMrzWkktd1XqFNTg0RdEIowBpnFzJpoJypRR5XrK7-S_pfk8YR9uSn44kreAAMi9yIwilZpQjdzFgY4fg92U4Wg721JNt7dyTPfVkQVnqiWiPMw0pwcFjsNF57CitD-hGW_f-_we_PqNy2Q</recordid><startdate>201807</startdate><enddate>201807</enddate><creator>Ekpenyong, Moses</creator><creator>Inyang, Udoinyang</creator><creator>Udoh, EmemObong</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6774-5259</orcidid></search><sort><creationdate>201807</creationdate><title>Unsupervised visualization of Under-resourced speech prosody</title><author>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial intelligence</topic><topic>Clusters</topic><topic>Coastal zone</topic><topic>Corpus linguistics</topic><topic>English language</topic><topic>Feature extraction</topic><topic>Fundamental frequency</topic><topic>Ibibio-Efik languages</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Pattern analysis</topic><topic>Phonemes</topic><topic>Prosodic features</topic><topic>Resonant frequencies</topic><topic>Self organizing maps</topic><topic>Self-organizing map</topic><topic>Speech</topic><topic>Speech duration</topic><topic>Speech prosody</topic><topic>Syllables</topic><topic>Tone</topic><topic>Tone modeling</topic><topic>Translation</topic><topic>Visualization</topic><topic>Vowels</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ekpenyong, Moses</creatorcontrib><creatorcontrib>Inyang, Udoinyang</creatorcontrib><creatorcontrib>Udoh, EmemObong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ekpenyong, Moses</au><au>Inyang, Udoinyang</au><au>Udoh, EmemObong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised visualization of Under-resourced speech prosody</atitle><jtitle>Speech communication</jtitle><date>2018-07</date><risdate>2018</risdate><volume>101</volume><spage>45</spage><epage>56</epage><pages>45-56</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><abstract>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2018.04.011</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-6774-5259</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-6393 |
ispartof | Speech communication, 2018-07, Vol.101, p.45-56 |
issn | 0167-6393 1872-7182 |
language | eng |
recordid | cdi_proquest_journals_2100379782 |
source | Access via ScienceDirect (Elsevier) |
subjects | Artificial intelligence Clusters Coastal zone Corpus linguistics English language Feature extraction Fundamental frequency Ibibio-Efik languages Linguistics Machine learning Pattern analysis Phonemes Prosodic features Resonant frequencies Self organizing maps Self-organizing map Speech Speech duration Speech prosody Syllables Tone Tone modeling Translation Visualization Vowels |
title | Unsupervised visualization of Under-resourced speech prosody |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A25%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20visualization%20of%20Under-resourced%20speech%20prosody&rft.jtitle=Speech%20communication&rft.au=Ekpenyong,%20Moses&rft.date=2018-07&rft.volume=101&rft.spage=45&rft.epage=56&rft.pages=45-56&rft.issn=0167-6393&rft.eissn=1872-7182&rft_id=info:doi/10.1016/j.specom.2018.04.011&rft_dat=%3Cproquest_cross%3E2100379782%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2100379782&rft_id=info:pmid/&rft_els_id=S016763931730225X&rfr_iscdi=true |