CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services
Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023-01, Vol.31, p.1-12 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 31 |
creator | Ben, Liu Jun, Wang Guanyuan, Yu Shaolei, Chen |
description | Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub. |
doi_str_mv | 10.1109/TASLP.2023.3293042 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2839515934</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10174656</ieee_id><sourcerecordid>2839515934</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</originalsourceid><addsrcrecordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2839515934</pqid></control><display><type>article</type><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><source>IEEE Electronic Library (IEL)</source><creator>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creator><creatorcontrib>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</creatorcontrib><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3293042</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Banking ; Constraint-based Unsupervised Prosody Tansfer ; Customer satisfaction ; Customer services ; Data models ; Feature extraction ; Hammers ; Linguistics ; Modules ; Prosody ; Rhythm ; Speech ; Telephone banking ; Telephone Banking Services ; Telephone sets ; Telephones ; Timbre ; Uniqueness</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</cites><orcidid>0000-0001-9833-1630 ; 0000-0002-1221-6023 ; 0000-0002-2613-2752</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27922,27923,54756</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10174656$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</description><subject>Banking</subject><subject>Constraint-based Unsupervised Prosody Tansfer</subject><subject>Customer satisfaction</subject><subject>Customer services</subject><subject>Data models</subject><subject>Feature extraction</subject><subject>Hammers</subject><subject>Linguistics</subject><subject>Modules</subject><subject>Prosody</subject><subject>Rhythm</subject><subject>Speech</subject><subject>Telephone banking</subject><subject>Telephone Banking Services</subject><subject>Telephone sets</subject><subject>Telephones</subject><subject>Timbre</subject><subject>Uniqueness</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUMlOwzAUtBBIVKU_gDhY4pziJYvNrUQslSpRqSlX4zjPkNI6xU4r9e9JaJE4vZmnmbcMQteUjCkl8q6YLGbzMSOMjzmTnMTsDA1YB6OenP9hJsklGoWwIoRQkkmZxQP0ni_nb_k9nuC8caH1unZtVOoAFV66sNuC39c9mfsmNNUBF167YMFj23g83Wx9s6_dBy5gDdvPxgF-0O6r7yx6p4FwhS6sXgcYneoQLZ8ei_wlmr0-T_PJLDIsztpI8CqxIiOspLYiIKpE6LTrMGOtKCk1RttMGl5yUQITpZWp1CZObKnjzLKED9HtcW530vcOQqtWzc67bqVigsuEJpLHnYodVab7J3iwauvrjfYHRYnqw1S_Yao-THUKszPdHE01APwz0CxOk5T_AKF5cZE</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Ben, Liu</creator><creator>Jun, Wang</creator><creator>Guanyuan, Yu</creator><creator>Shaolei, Chen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></search><sort><creationdate>20230101</creationdate><title>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</title><author>Ben, Liu ; Jun, Wang ; Guanyuan, Yu ; Shaolei, Chen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-83d5f8702b1fd0e8d58a65f82cff8b11ccaf79c3b38be28bf969ac45fba47f253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Banking</topic><topic>Constraint-based Unsupervised Prosody Tansfer</topic><topic>Customer satisfaction</topic><topic>Customer services</topic><topic>Data models</topic><topic>Feature extraction</topic><topic>Hammers</topic><topic>Linguistics</topic><topic>Modules</topic><topic>Prosody</topic><topic>Rhythm</topic><topic>Speech</topic><topic>Telephone banking</topic><topic>Telephone Banking Services</topic><topic>Telephone sets</topic><topic>Telephones</topic><topic>Timbre</topic><topic>Uniqueness</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ben, Liu</creatorcontrib><creatorcontrib>Jun, Wang</creatorcontrib><creatorcontrib>Guanyuan, Yu</creatorcontrib><creatorcontrib>Shaolei, Chen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ben, Liu</au><au>Jun, Wang</au><au>Guanyuan, Yu</au><au>Shaolei, Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>31</volume><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Low efficiency in telephone banking services reduces customer satisfaction. Therefore, some recent studies have concentrated on applying voice conversion models to improve telephone banking services. However, building such a model raises three huge challenges, as practical telephone banking services require natural and high-quality conversations. These challenges include the lack of parallel speech data, difficulty in generating natural speech, and difficulty in modeling long speech. To tackle such challenges, we propose a novel unsupervised prosody transfer for improving customer satisfaction in telephone conversations relying on grounded theoretical foundations. Our model consists of a solo-encoding disentanglement module and a forge module. (i) The disentanglement module uses three unique constraints to effectively reduce manual feature engineering and training costs and decompose extremely long speech without parallel data. (ii) The forge module hammers at converting the source prosody to the target one and guarantees correct fine-grained alignments, thereby generating natural speech. Finally, extensive experiments are conducted on large-scale telephone recordings from XWbank in China and suggest that our model can achieve promising outcomes. Moreover, we open-source our codes and unique datasets on GitHub.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3293042</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9833-1630</orcidid><orcidid>https://orcid.org/0000-0002-1221-6023</orcidid><orcidid>https://orcid.org/0000-0002-2613-2752</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2023-01, Vol.31, p.1-12 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_proquest_journals_2839515934 |
source | IEEE Electronic Library (IEL) |
subjects | Banking Constraint-based Unsupervised Prosody Tansfer Customer satisfaction Customer services Data models Feature extraction Hammers Linguistics Modules Prosody Rhythm Speech Telephone banking Telephone Banking Services Telephone sets Telephones Timbre Uniqueness |
title | CUPVC: A Constraint-based Unsupervised Prosody Transfer for Improving Telephone Banking Services |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T08%3A11%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CUPVC:%20A%20Constraint-based%20Unsupervised%20Prosody%20Transfer%20for%20Improving%20Telephone%20Banking%20Services&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Ben,%20Liu&rft.date=2023-01-01&rft.volume=31&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3293042&rft_dat=%3Cproquest_RIE%3E2839515934%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2839515934&rft_id=info:pmid/&rft_ieee_id=10174656&rfr_iscdi=true |