CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services

Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023-01, Vol.31, p.1-12
Hauptverfasser:	Ben, Liu, Jun, Wang, Guanyuan, Yu, Shaolei, Chen
Format:	Artikel
Sprache:	eng
Schlagworte:	Banking Constraint-based Unsupervised Prosody Tansfer Customer satisfaction Customer services Data models Feature extraction Hammers Linguistics Modules Prosody Rhythm Speech Telephone banking Telephone Banking Services Telephone sets Telephones Timbre Uniqueness
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	12
container_issue
container_start_page	1
container_title	IEEE/ACM transactions on audio, speech, and language processing
container_volume	31
creator	Ben, Liu Jun, Wang Guanyuan, Yu Shaolei, Chen
description	Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.
doi_str_mv	10.1109/TASLP.2023.3293042
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2839515934</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10174656</ieee_id><sourcerecordid>2839515934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</originalsourceid><addsrcrecordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2839515934</pqid></control><display><type>article</type><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><source>IEEE Electronic Library (IEL)</source><creator>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creator><creatorcontrib>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creatorcontrib><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3293042</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Banking ; Constraint-based Unsupervised Prosody Tansfer ; Customer satisfaction ; Customer services ; Data models ; Feature extraction ; Hammers ; Linguistics ; Modules ; Prosody ; Rhythm ; Speech ; Telephone banking ; Telephone Banking Services ; Telephone sets ; Telephones ; Timbre ; Uniqueness</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</cites><orcidid>0000-0001-9833-1630 ; 0000-0002-1221-6023 ; 0000-0002-2613-2752</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><subject>Banking</subject><subject>Constraint-based Unsupervised Prosody Tansfer</subject><subject>Customer satisfaction</subject><subject>Customer services</subject><subject>Data models</subject><subject>Feature extraction</subject><subject>Hammers</subject><subject>Linguistics</subject><subject>Modules</subject><subject>Prosody</subject><subject>Rhythm</subject><subject>Speech</subject><subject>Telephone banking</subject><subject>Telephone Banking Services</subject><subject>Telephone sets</subject><subject>Telephones</subject><subject>Timbre</subject><subject>Uniqueness</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Ben, Liu</creator><creator>Jun, Wang</creator><creator>Guanyuan, Yu</creator><creator>Shaolei, Chen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></search><sort><creationdate>20230101</creationdate><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><author>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Banking</topic><topic>Constraint-based Unsupervised Prosody Tansfer</topic><topic>Customer satisfaction</topic><topic>Customer services</topic><topic>Data models</topic><topic>Feature extraction</topic><topic>Hammers</topic><topic>Linguistics</topic><topic>Modules</topic><topic>Prosody</topic><topic>Rhythm</topic><topic>Speech</topic><topic>Telephone banking</topic><topic>Telephone Banking Services</topic><topic>Telephone sets</topic><topic>Telephones</topic><topic>Timbre</topic><topic>Uniqueness</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ben, Liu</au><au>Jun, Wang</au><au>Guanyuan, Yu</au><au>Shaolei, Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>31</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3293042</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2329-9290
ispartof	IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12
issn	2329-9290 2329-9304
language	eng
recordid	cdi_proquest_journals_2839515934
source	IEEE Electronic Library (IEL)
subjects	Banking Constraint-based Unsupervised Prosody Tansfer Customer satisfaction Customer services Data models Feature extraction Hammers Linguistics Modules Prosody Rhythm Speech Telephone banking Telephone Banking Services Telephone sets Telephones Timbre Uniqueness
title	CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T08%3A11%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CUPVC:%20A%20Constraint-based%20Unsupervised%20Prosody%20Transfer%20for%20Improving%20Telephone%20Banking%20Services&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Ben,%20Liu&rft.date=2023-01-01&rft.volume=31&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3293042&rft_dat=%3Cproquest_RIE%3E2839515934%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2839515934&rft_id=info:pmid/&rft_ieee_id=10174656&rfr_iscdi=true