Load What You Need: Smaller Versions of Multilingual BERT

SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Abdaoui, Amine, Pradel, Camille, Sigel, Grégoire
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Abdaoui, Amine
Pradel, Camille
Sigel, Grégoire
description SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.
doi_str_mv 10.48550/arxiv.2010.05609
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_05609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_05609</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-e5357efd2ffc6baa77ff4050e9cb9a82d7c2d0f15bdb46db4f5b177985e8c8dc3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL3pArV8ACv8AynOw7HNrlTlIQWQ2gjEKrq27y2W3AY5DYK_pxQWo5FmcTSHsYtczCstpbiC9BU-54U4DkLWwpwx0_Tg-es7HPhbP_InRH_NNzuIERN_wTSEfj_wnvjjGA8hhv12hMhvVut2xiYEccDz_56y9nbVLu-z5vnuYbloMqiVyVCWUiH5gsjVFkApokpIgcZZA7rwyhVeUC6tt1V9DEmbK2W0RO20d-WUXf5hT9-7jxR2kL67X4fu5FD-AHFtQUQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Load What You Need: Smaller Versions of Multilingual BERT</title><source>arXiv.org</source><creator>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</creator><creatorcontrib>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</creatorcontrib><description>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</description><identifier>DOI: 10.48550/arxiv.2010.05609</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.05609$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.05609$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Abdaoui, Amine</creatorcontrib><creatorcontrib>Pradel, Camille</creatorcontrib><creatorcontrib>Sigel, Grégoire</creatorcontrib><title>Load What You Need: Smaller Versions of Multilingual BERT</title><description>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL3pArV8ACv8AynOw7HNrlTlIQWQ2gjEKrq27y2W3AY5DYK_pxQWo5FmcTSHsYtczCstpbiC9BU-54U4DkLWwpwx0_Tg-es7HPhbP_InRH_NNzuIERN_wTSEfj_wnvjjGA8hhv12hMhvVut2xiYEccDz_56y9nbVLu-z5vnuYbloMqiVyVCWUiH5gsjVFkApokpIgcZZA7rwyhVeUC6tt1V9DEmbK2W0RO20d-WUXf5hT9-7jxR2kL67X4fu5FD-AHFtQUQ</recordid><startdate>20201012</startdate><enddate>20201012</enddate><creator>Abdaoui, Amine</creator><creator>Pradel, Camille</creator><creator>Sigel, Grégoire</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201012</creationdate><title>Load What You Need: Smaller Versions of Multilingual BERT</title><author>Abdaoui, Amine ; Pradel, Camille ; Sigel, Grégoire</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-e5357efd2ffc6baa77ff4050e9cb9a82d7c2d0f15bdb46db4f5b177985e8c8dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Abdaoui, Amine</creatorcontrib><creatorcontrib>Pradel, Camille</creatorcontrib><creatorcontrib>Sigel, Grégoire</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Abdaoui, Amine</au><au>Pradel, Camille</au><au>Sigel, Grégoire</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Load What You Need: Smaller Versions of Multilingual BERT</atitle><date>2020-10-12</date><risdate>2020</risdate><abstract>SustaiNLP / EMNLP 2020 Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.</abstract><doi>10.48550/arxiv.2010.05609</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2010.05609
ispartof
issn
language eng
recordid cdi_arxiv_primary_2010_05609
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
title Load What You Need: Smaller Versions of Multilingual BERT
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T22%3A30%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Load%20What%20You%20Need:%20Smaller%20Versions%20of%20Multilingual%20BERT&rft.au=Abdaoui,%20Amine&rft.date=2020-10-12&rft_id=info:doi/10.48550/arxiv.2010.05609&rft_dat=%3Carxiv_GOX%3E2010_05609%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true