Unsupervised visualization of Under-resourced speech prosody

In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology ado...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2018-07, Vol.101, p.45-56
Hauptverfasser: Ekpenyong, Moses, Inyang, Udoinyang, Udoh, EmemObong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 56
container_issue
container_start_page 45
container_title Speech communication
container_volume 101
creator Ekpenyong, Moses
Inyang, Udoinyang
Udoh, EmemObong
description In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.
doi_str_mv 10.1016/j.specom.2018.04.011
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2100379782</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S016763931730225X</els_id><sourcerecordid>2100379782</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</originalsourceid><addsrcrecordid>eNp9UEtLw0AQXkTBWv0HHgKeE2cf3WxABCm-oODFnpdkM8ENbRJ3k0L99U6JZw8zc_m--R6M3XLIOHB932ZxQNfvMwHcZKAy4PyMLbjJRZpzI87ZgmB5qmUhL9lVjC0AKGPEgj1suzgNGA4-Yp3Qnsqd_ylH33dJ3yTbrsaQBoz9FBwBSAfdVzKEPvb18ZpdNOUu4s3fXbLty_Pn-i3dfLy-r582qZNSjSnXWuQVgNCooERErZuqMrzWkktd1XqFNTg0RdEIowBpnFzJpoJypRR5XrK7-S_pfk8YR9uSn44kreAAMi9yIwilZpQjdzFgY4fg92U4Wg721JNt7dyTPfVkQVnqiWiPMw0pwcFjsNF57CitD-hGW_f-_we_PqNy2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2100379782</pqid></control><display><type>article</type><title>Unsupervised visualization of Under-resourced speech prosody</title><source>Access via ScienceDirect (Elsevier)</source><creator>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</creator><creatorcontrib>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</creatorcontrib><description>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2018.04.011</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Artificial intelligence ; Clusters ; Coastal zone ; Corpus linguistics ; English language ; Feature extraction ; Fundamental frequency ; Ibibio-Efik languages ; Linguistics ; Machine learning ; Pattern analysis ; Phonemes ; Prosodic features ; Resonant frequencies ; Self organizing maps ; Self-organizing map ; Speech ; Speech duration ; Speech prosody ; Syllables ; Tone ; Tone modeling ; Translation ; Visualization ; Vowels</subject><ispartof>Speech communication, 2018-07, Vol.101, p.45-56</ispartof><rights>2018 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jul 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</citedby><cites>FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</cites><orcidid>0000-0001-6774-5259</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2018.04.011$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Ekpenyong, Moses</creatorcontrib><creatorcontrib>Inyang, Udoinyang</creatorcontrib><creatorcontrib>Udoh, EmemObong</creatorcontrib><title>Unsupervised visualization of Under-resourced speech prosody</title><title>Speech communication</title><description>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</description><subject>Artificial intelligence</subject><subject>Clusters</subject><subject>Coastal zone</subject><subject>Corpus linguistics</subject><subject>English language</subject><subject>Feature extraction</subject><subject>Fundamental frequency</subject><subject>Ibibio-Efik languages</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Pattern analysis</subject><subject>Phonemes</subject><subject>Prosodic features</subject><subject>Resonant frequencies</subject><subject>Self organizing maps</subject><subject>Self-organizing map</subject><subject>Speech</subject><subject>Speech duration</subject><subject>Speech prosody</subject><subject>Syllables</subject><subject>Tone</subject><subject>Tone modeling</subject><subject>Translation</subject><subject>Visualization</subject><subject>Vowels</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp9UEtLw0AQXkTBWv0HHgKeE2cf3WxABCm-oODFnpdkM8ENbRJ3k0L99U6JZw8zc_m--R6M3XLIOHB932ZxQNfvMwHcZKAy4PyMLbjJRZpzI87ZgmB5qmUhL9lVjC0AKGPEgj1suzgNGA4-Yp3Qnsqd_ylH33dJ3yTbrsaQBoz9FBwBSAfdVzKEPvb18ZpdNOUu4s3fXbLty_Pn-i3dfLy-r582qZNSjSnXWuQVgNCooERErZuqMrzWkktd1XqFNTg0RdEIowBpnFzJpoJypRR5XrK7-S_pfk8YR9uSn44kreAAMi9yIwilZpQjdzFgY4fg92U4Wg721JNt7dyTPfVkQVnqiWiPMw0pwcFjsNF57CitD-hGW_f-_we_PqNy2Q</recordid><startdate>201807</startdate><enddate>201807</enddate><creator>Ekpenyong, Moses</creator><creator>Inyang, Udoinyang</creator><creator>Udoh, EmemObong</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6774-5259</orcidid></search><sort><creationdate>201807</creationdate><title>Unsupervised visualization of Under-resourced speech prosody</title><author>Ekpenyong, Moses ; Inyang, Udoinyang ; Udoh, EmemObong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-16627b0026e40aeee66fbb81d63136bd65ed0ce899f2840e840c353fb0a544393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial intelligence</topic><topic>Clusters</topic><topic>Coastal zone</topic><topic>Corpus linguistics</topic><topic>English language</topic><topic>Feature extraction</topic><topic>Fundamental frequency</topic><topic>Ibibio-Efik languages</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Pattern analysis</topic><topic>Phonemes</topic><topic>Prosodic features</topic><topic>Resonant frequencies</topic><topic>Self organizing maps</topic><topic>Self-organizing map</topic><topic>Speech</topic><topic>Speech duration</topic><topic>Speech prosody</topic><topic>Syllables</topic><topic>Tone</topic><topic>Tone modeling</topic><topic>Translation</topic><topic>Visualization</topic><topic>Vowels</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ekpenyong, Moses</creatorcontrib><creatorcontrib>Inyang, Udoinyang</creatorcontrib><creatorcontrib>Udoh, EmemObong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ekpenyong, Moses</au><au>Inyang, Udoinyang</au><au>Udoh, EmemObong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised visualization of Under-resourced speech prosody</atitle><jtitle>Speech communication</jtitle><date>2018-07</date><risdate>2018</risdate><volume>101</volume><spage>45</spage><epage>56</epage><pages>45-56</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><abstract>In this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates the prosody of read-aloud English. A self-organizing map (SOM) was used to learn the classification of certain input vectors (speech duration, fundamental frequency: F0, phoneme pattern (vowels only), tone pattern), and provide visualization of the clusters structure. Results obtained from the experiment showed that duration and F0 features realized from speech syllables are indispensable for modeling phoneme and tone patterns, but the tone input classes revealed clusters with well separated boundaries and well distributed component planes, compared to the phoneme input classes. Further, except for very few outliers, the map weights were well distributed with proper neighboring neuron connections across the input space. A possible future work to advance this research is the development of the language's corpus, for the discovery of prosodic patterns in expressive speech.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2018.04.011</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-6774-5259</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0167-6393
ispartof Speech communication, 2018-07, Vol.101, p.45-56
issn 0167-6393
1872-7182
language eng
recordid cdi_proquest_journals_2100379782
source Access via ScienceDirect (Elsevier)
subjects Artificial intelligence
Clusters
Coastal zone
Corpus linguistics
English language
Feature extraction
Fundamental frequency
Ibibio-Efik languages
Linguistics
Machine learning
Pattern analysis
Phonemes
Prosodic features
Resonant frequencies
Self organizing maps
Self-organizing map
Speech
Speech duration
Speech prosody
Syllables
Tone
Tone modeling
Translation
Visualization
Vowels
title Unsupervised visualization of Under-resourced speech prosody
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A25%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20visualization%20of%20Under-resourced%20speech%20prosody&rft.jtitle=Speech%20communication&rft.au=Ekpenyong,%20Moses&rft.date=2018-07&rft.volume=101&rft.spage=45&rft.epage=56&rft.pages=45-56&rft.issn=0167-6393&rft.eissn=1872-7182&rft_id=info:doi/10.1016/j.specom.2018.04.011&rft_dat=%3Cproquest_cross%3E2100379782%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2100379782&rft_id=info:pmid/&rft_els_id=S016763931730225X&rfr_iscdi=true