CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation

In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language gen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Science China. Information sciences 2024-05, Vol.67 (5), p.152102, Article 152102
Hauptverfasser: Shao, Yunfan, Geng, Zhichao, Liu, Yitao, Dai, Junqi, Yan, Hang, Yang, Fei, Li, Zhe, Bao, Hujun, Qiu, Xipeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 5
container_start_page 152102
container_title Science China. Information sciences
container_volume 67
creator Shao, Yunfan
Geng, Zhichao
Liu, Yitao
Dai, Junqi
Yan, Hang
Yang, Fei
Li, Zhe
Bao, Hujun
Qiu, Xipeng
description In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.
doi_str_mv 10.1007/s11432-021-3536-5
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3037834973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3037834973</sourcerecordid><originalsourceid>FETCH-LOGICAL-c344t-9648c4790f84239fefdf1c23d74613ace46b5ee9634f7ed3087c281c57369cdf3</originalsourceid><addsrcrecordid>eNqFkE1LAzEQhhdRsNT-AG8Bz9FkJ5tsvEnxCwp6qOAtptnJdkubrcn24L83ZQVPYi7vEJ53Bp6iuOTsmjOmbhLnAkrKSk6hAkmrk2LCa6kp11yf5lkqQRXA-3kxS2nD8gNgpaonxcf8dXlLLNlHpEO0XcCGHMLKbm1wecxfIfk-7jCSHGTVD2syX2csIclMe7At5kKDMQ02NF1oSQ7SYsBoh64PF8WZt9uEs5-cFm8P98v5E128PD7P7xbUgRAD1VLUTijNfC1K0B5947kroVFCcrAOhVxViFqC8AobYLVyZc1dpUBq13iYFlfj3n3sPw-YBrPpDzHkkwYYqBqEzgb-oaSsOPBM8ZFysU8pojf72O1s_DKcmaNxMxo32bg5GjdV7pRjJ2U2tBh_N_9d-gYR1YKK</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3037665131</pqid></control><display><type>article</type><title>CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation</title><source>Alma/SFX Local Collection</source><source>SpringerLink Journals - AutoHoldings</source><creator>Shao, Yunfan ; Geng, Zhichao ; Liu, Yitao ; Dai, Junqi ; Yan, Hang ; Yang, Fei ; Li, Zhe ; Bao, Hujun ; Qiu, Xipeng</creator><creatorcontrib>Shao, Yunfan ; Geng, Zhichao ; Liu, Yitao ; Dai, Junqi ; Yan, Hang ; Yang, Fei ; Li, Zhe ; Bao, Hujun ; Qiu, Xipeng</creatorcontrib><description>In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.</description><identifier>ISSN: 1674-733X</identifier><identifier>EISSN: 1869-1919</identifier><identifier>DOI: 10.1007/s11432-021-3536-5</identifier><language>eng</language><publisher>Beijing: Science China Press</publisher><subject>Coders ; Computer Science ; Decoders ; Information Systems and Communication Service ; Natural language ; Research Paper ; Speech recognition ; Transformers</subject><ispartof>Science China. Information sciences, 2024-05, Vol.67 (5), p.152102, Article 152102</ispartof><rights>Science China Press 2024</rights><rights>Science China Press 2024.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c344t-9648c4790f84239fefdf1c23d74613ace46b5ee9634f7ed3087c281c57369cdf3</citedby><cites>FETCH-LOGICAL-c344t-9648c4790f84239fefdf1c23d74613ace46b5ee9634f7ed3087c281c57369cdf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11432-021-3536-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11432-021-3536-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,41486,42555,51317</link.rule.ids></links><search><creatorcontrib>Shao, Yunfan</creatorcontrib><creatorcontrib>Geng, Zhichao</creatorcontrib><creatorcontrib>Liu, Yitao</creatorcontrib><creatorcontrib>Dai, Junqi</creatorcontrib><creatorcontrib>Yan, Hang</creatorcontrib><creatorcontrib>Yang, Fei</creatorcontrib><creatorcontrib>Li, Zhe</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Qiu, Xipeng</creatorcontrib><title>CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation</title><title>Science China. Information sciences</title><addtitle>Sci. China Inf. Sci</addtitle><description>In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.</description><subject>Coders</subject><subject>Computer Science</subject><subject>Decoders</subject><subject>Information Systems and Communication Service</subject><subject>Natural language</subject><subject>Research Paper</subject><subject>Speech recognition</subject><subject>Transformers</subject><issn>1674-733X</issn><issn>1869-1919</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkE1LAzEQhhdRsNT-AG8Bz9FkJ5tsvEnxCwp6qOAtptnJdkubrcn24L83ZQVPYi7vEJ53Bp6iuOTsmjOmbhLnAkrKSk6hAkmrk2LCa6kp11yf5lkqQRXA-3kxS2nD8gNgpaonxcf8dXlLLNlHpEO0XcCGHMLKbm1wecxfIfk-7jCSHGTVD2syX2csIclMe7At5kKDMQ02NF1oSQ7SYsBoh64PF8WZt9uEs5-cFm8P98v5E128PD7P7xbUgRAD1VLUTijNfC1K0B5947kroVFCcrAOhVxViFqC8AobYLVyZc1dpUBq13iYFlfj3n3sPw-YBrPpDzHkkwYYqBqEzgb-oaSsOPBM8ZFysU8pojf72O1s_DKcmaNxMxo32bg5GjdV7pRjJ2U2tBh_N_9d-gYR1YKK</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Shao, Yunfan</creator><creator>Geng, Zhichao</creator><creator>Liu, Yitao</creator><creator>Dai, Junqi</creator><creator>Yan, Hang</creator><creator>Yang, Fei</creator><creator>Li, Zhe</creator><creator>Bao, Hujun</creator><creator>Qiu, Xipeng</creator><general>Science China Press</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20240501</creationdate><title>CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation</title><author>Shao, Yunfan ; Geng, Zhichao ; Liu, Yitao ; Dai, Junqi ; Yan, Hang ; Yang, Fei ; Li, Zhe ; Bao, Hujun ; Qiu, Xipeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c344t-9648c4790f84239fefdf1c23d74613ace46b5ee9634f7ed3087c281c57369cdf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Coders</topic><topic>Computer Science</topic><topic>Decoders</topic><topic>Information Systems and Communication Service</topic><topic>Natural language</topic><topic>Research Paper</topic><topic>Speech recognition</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shao, Yunfan</creatorcontrib><creatorcontrib>Geng, Zhichao</creatorcontrib><creatorcontrib>Liu, Yitao</creatorcontrib><creatorcontrib>Dai, Junqi</creatorcontrib><creatorcontrib>Yan, Hang</creatorcontrib><creatorcontrib>Yang, Fei</creatorcontrib><creatorcontrib>Li, Zhe</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Qiu, Xipeng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Science China. Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shao, Yunfan</au><au>Geng, Zhichao</au><au>Liu, Yitao</au><au>Dai, Junqi</au><au>Yan, Hang</au><au>Yang, Fei</au><au>Li, Zhe</au><au>Bao, Hujun</au><au>Qiu, Xipeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation</atitle><jtitle>Science China. Information sciences</jtitle><stitle>Sci. China Inf. Sci</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>67</volume><issue>5</issue><spage>152102</spage><pages>152102-</pages><artnum>152102</artnum><issn>1674-733X</issn><eissn>1869-1919</eissn><abstract>In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.</abstract><cop>Beijing</cop><pub>Science China Press</pub><doi>10.1007/s11432-021-3536-5</doi></addata></record>
fulltext fulltext
identifier ISSN: 1674-733X
ispartof Science China. Information sciences, 2024-05, Vol.67 (5), p.152102, Article 152102
issn 1674-733X
1869-1919
language eng
recordid cdi_proquest_journals_3037834973
source Alma/SFX Local Collection; SpringerLink Journals - AutoHoldings
subjects Coders
Computer Science
Decoders
Information Systems and Communication Service
Natural language
Research Paper
Speech recognition
Transformers
title CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T06%3A17%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CPT:%20a%20pre-trained%20unbalanced%20transformer%20for%20both%20Chinese%20language%20understanding%20and%20generation&rft.jtitle=Science%20China.%20Information%20sciences&rft.au=Shao,%20Yunfan&rft.date=2024-05-01&rft.volume=67&rft.issue=5&rft.spage=152102&rft.pages=152102-&rft.artnum=152102&rft.issn=1674-733X&rft.eissn=1869-1919&rft_id=info:doi/10.1007/s11432-021-3536-5&rft_dat=%3Cproquest_cross%3E3037834973%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3037665131&rft_id=info:pmid/&rfr_iscdi=true