Load What You Need: Smaller Versions of Multilingual BERT

SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Abdaoui, Amine, Pradel, Camille, Sigel, Grégoire
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Abdaoui, Amine Pradel, Camille Sigel, Grégoire
description	SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.
doi_str_mv	10.48550/arxiv.2010.05609
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_05609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_05609</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-e5357efd2ffc6baa77ff4050e9cb9a82d7c2d0f15bdb46db4f5b177985e8c8dc3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL3pArV8ACv8AynOw7HNrlTlIQWQ2gjEKrq27y2W3AY5DYK_pxQWo5FmcTSHsYtczCstpbiC9BU-54U4DkLWwpwx0_Tg-es7HPhbP_InRH_NNzuIERN_wTSEfj_wnvjjGA8hhv12hMhvVut2xiYEccDz_56y9nbVLu-z5vnuYbloMqiVyVCWUiH5gsjVFkApokpIgcZZA7rwyhVeUC6tt1V9DEmbK2W0RO20d-WUXf5hT9-7jxR2kL67X4fu5FD-AHFtQUQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Load What You Need: Smaller Versions of Multilingual BERT</title><source>arXiv.org</source><creator>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</creator><creatorcontrib>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</creatorcontrib><description>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</description><identifier>DOI: 10.48550/arxiv.2010.05609</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.05609$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.05609$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Abdaoui, Amine</creatorcontrib><creatorcontrib>Pradel, Camille</creatorcontrib><creatorcontrib>Sigel, Grégoire</creatorcontrib><title>Load What You Need: Smaller Versions of Multilingual BERT</title><description>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL3pArV8ACv8AynOw7HNrlTlIQWQ2gjEKrq27y2W3AY5DYK_pxQWo5FmcTSHsYtczCstpbiC9BU-54U4DkLWwpwx0_Tg-es7HPhbP_InRH_NNzuIERN_wTSEfj_wnvjjGA8hhv12hMhvVut2xiYEccDz_56y9nbVLu-z5vnuYbloMqiVyVCWUiH5gsjVFkApokpIgcZZA7rwyhVeUC6tt1V9DEmbK2W0RO20d-WUXf5hT9-7jxR2kL67X4fu5FD-AHFtQUQ</recordid><startdate>20201012</startdate><enddate>20201012</enddate><creator>Abdaoui, Amine</creator><creator>Pradel, Camille</creator><creator>Sigel, Grégoire</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201012</creationdate><title>Load What You Need: Smaller Versions of Multilingual BERT</title><author>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-e5357efd2ffc6baa77ff4050e9cb9a82d7c2d0f15bdb46db4f5b177985e8c8dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Abdaoui, Amine</creatorcontrib><creatorcontrib>Pradel, Camille</creatorcontrib><creatorcontrib>Sigel, Grégoire</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Abdaoui, Amine</au><au>Pradel, Camille</au><au>Sigel, Grégoire</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Load What You Need: Smaller Versions of Multilingual BERT</atitle><date>2020-10-12</date><risdate>2020</risdate><abstract>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</abstract><doi>10.48550/arxiv.2010.05609</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2010.05609
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2010_05609
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	Load What You Need: Smaller Versions of Multilingual BERT
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T22%3A30%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Load%20What%20You%20Need:%20Smaller%20Versions%20of%20Multilingual%20BERT&rft.au=Abdaoui,%20Amine&rft.date=2020-10-12&rft_id=info:doi/10.48550/arxiv.2010.05609&rft_dat=%3Carxiv_GOX%3E2010_05609%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true