CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services

Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023-01, Vol.31, p.1-12
Hauptverfasser: Ben, Liu, Jun, Wang, Guanyuan, Yu, Shaolei, Chen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue
container_start_page 1
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 31
creator Ben, Liu
Jun, Wang
Guanyuan, Yu
Shaolei, Chen
description Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.
doi_str_mv 10.1109/TASLP.2023.3293042
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2839515934</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10174656</ieee_id><sourcerecordid>2839515934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</originalsourceid><addsrcrecordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2839515934</pqid></control><display><type>article</type><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><source>IEEE Electronic Library (IEL)</source><creator>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creator><creatorcontrib>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creatorcontrib><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3293042</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Banking ; Constraint-based Unsupervised Prosody Tansfer ; Customer satisfaction ; Customer services ; Data models ; Feature extraction ; Hammers ; Linguistics ; Modules ; Prosody ; Rhythm ; Speech ; Telephone banking ; Telephone Banking Services ; Telephone sets ; Telephones ; Timbre ; Uniqueness</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</cites><orcidid>0000-0001-9833-1630 ; 0000-0002-1221-6023 ; 0000-0002-2613-2752</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><subject>Banking</subject><subject>Constraint-based Unsupervised Prosody Tansfer</subject><subject>Customer satisfaction</subject><subject>Customer services</subject><subject>Data models</subject><subject>Feature extraction</subject><subject>Hammers</subject><subject>Linguistics</subject><subject>Modules</subject><subject>Prosody</subject><subject>Rhythm</subject><subject>Speech</subject><subject>Telephone banking</subject><subject>Telephone Banking Services</subject><subject>Telephone sets</subject><subject>Telephones</subject><subject>Timbre</subject><subject>Uniqueness</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Ben, Liu</creator><creator>Jun, Wang</creator><creator>Guanyuan, Yu</creator><creator>Shaolei, Chen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></search><sort><creationdate>20230101</creationdate><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><author>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Banking</topic><topic>Constraint-based Unsupervised Prosody Tansfer</topic><topic>Customer satisfaction</topic><topic>Customer services</topic><topic>Data models</topic><topic>Feature extraction</topic><topic>Hammers</topic><topic>Linguistics</topic><topic>Modules</topic><topic>Prosody</topic><topic>Rhythm</topic><topic>Speech</topic><topic>Telephone banking</topic><topic>Telephone Banking Services</topic><topic>Telephone sets</topic><topic>Telephones</topic><topic>Timbre</topic><topic>Uniqueness</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ben, Liu</au><au>Jun, Wang</au><au>Guanyuan, Yu</au><au>Shaolei, Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>31</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3293042</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12
issn 2329-9290
2329-9304
language eng
recordid cdi_proquest_journals_2839515934
source IEEE Electronic Library (IEL)
subjects Banking
Constraint-based Unsupervised Prosody Tansfer
Customer satisfaction
Customer services
Data models
Feature extraction
Hammers
Linguistics
Modules
Prosody
Rhythm
Speech
Telephone banking
Telephone Banking Services
Telephone sets
Telephones
Timbre
Uniqueness
title CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T08%3A11%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CUPVC:%20A%20Constraint-based%20Unsupervised%20Prosody%20Transfer%20for%20Improving%20Telephone%20Banking%20Services&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Ben,%20Liu&rft.date=2023-01-01&rft.volume=31&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3293042&rft_dat=%3Cproquest_RIE%3E2839515934%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2839515934&rft_id=info:pmid/&rft_ieee_id=10174656&rfr_iscdi=true