Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision

Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-ling...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Guo, Yinpeng, Li, Liangyou, Jiang, Xin, Liu, Qun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Guo, Yinpeng
Li, Liangyou
Jiang, Xin
Liu, Qun
description Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.
doi_str_mv 10.48550/arxiv.2106.05166
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2106_05166</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2106_05166</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-fa5e1ffe705b89e2d7744c2748f7817eb1b039c6ad58d3e5a1fc67a8de5e95623</originalsourceid><addsrcrecordid>eNotj81OhDAUhbtxYUYfwJV9AZAC_WFpyPiToCY6rsmF3s40wULaMurbC-Os7knuOV_yEXLDsrRUnGd34H_sMc1ZJtKMMyEuydggeGfdnr7MQ7TDkmYY6DtOHgO6CNGOjprR01eIs19eDayVPdJPp9GHCE6v828bD3TrDuB61LT2YwhJc6Z9zBP6ow0L6opcGBgCXp_vhuwetrv6KWneHp_r-yYBIUVigCMzBmXGO1VhrqUsyz6XpTJSMYkd67Ki6gVornSBHJjphQSlkWPFRV5syO0_9mTcTt5-gf9tV_P2ZF78AZ_zVr4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision</title><source>arXiv.org</source><creator>Guo, Yinpeng ; Li, Liangyou ; Jiang, Xin ; Liu, Qun</creator><creatorcontrib>Guo, Yinpeng ; Li, Liangyou ; Jiang, Xin ; Liu, Qun</creatorcontrib><description>Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.</description><identifier>DOI: 10.48550/arxiv.2106.05166</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2021-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2106.05166$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.05166$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Guo, Yinpeng</creatorcontrib><creatorcontrib>Li, Liangyou</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Liu, Qun</creatorcontrib><title>Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision</title><description>Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OhDAUhbtxYUYfwJV9AZAC_WFpyPiToCY6rsmF3s40wULaMurbC-Os7knuOV_yEXLDsrRUnGd34H_sMc1ZJtKMMyEuydggeGfdnr7MQ7TDkmYY6DtOHgO6CNGOjprR01eIs19eDayVPdJPp9GHCE6v828bD3TrDuB61LT2YwhJc6Z9zBP6ow0L6opcGBgCXp_vhuwetrv6KWneHp_r-yYBIUVigCMzBmXGO1VhrqUsyz6XpTJSMYkd67Ki6gVornSBHJjphQSlkWPFRV5syO0_9mTcTt5-gf9tV_P2ZF78AZ_zVr4</recordid><startdate>20210609</startdate><enddate>20210609</enddate><creator>Guo, Yinpeng</creator><creator>Li, Liangyou</creator><creator>Jiang, Xin</creator><creator>Liu, Qun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210609</creationdate><title>Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision</title><author>Guo, Yinpeng ; Li, Liangyou ; Jiang, Xin ; Liu, Qun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-fa5e1ffe705b89e2d7744c2748f7817eb1b039c6ad58d3e5a1fc67a8de5e95623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Guo, Yinpeng</creatorcontrib><creatorcontrib>Li, Liangyou</creatorcontrib><creatorcontrib>Jiang, Xin</creatorcontrib><creatorcontrib>Liu, Qun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Guo, Yinpeng</au><au>Li, Liangyou</au><au>Jiang, Xin</au><au>Liu, Qun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision</atitle><date>2021-06-09</date><risdate>2021</risdate><abstract>Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.</abstract><doi>10.48550/arxiv.2106.05166</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2106.05166
ispartof
issn
language eng
recordid cdi_arxiv_primary_2106_05166
source arXiv.org
subjects Computer Science - Computation and Language
title Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T03%3A13%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Multilingual%20Representation%20for%20Natural%20Language%20Understanding%20with%20Enhanced%20Cross-Lingual%20Supervision&rft.au=Guo,%20Yinpeng&rft.date=2021-06-09&rft_id=info:doi/10.48550/arxiv.2106.05166&rft_dat=%3Carxiv_GOX%3E2106_05166%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true