Aya 23: Open Weight Releases to Further Multilingual Progress

This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The re...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-05
Hauptverfasser:	Aryabumi, Viraat, Dang, John, Talupuru, Dwarak, Dash, Saurabh, Cairuz, David, Lin, Hangyu, Venkitesh, Bharat, Smith, Madeline, Jon Ander Campos, Tan, Yi Chern, Marchisio, Kelly, Bartolo, Max, Ruder, Sebastian, Locatelli, Acyr, Kreutzer, Julia, Frosst, Nick, Gomez, Aidan, Blunsom, Phil, Fadaee, Marzieh, Üstün, Ahmet, Hooker, Sara
Format:	Artikel
Sprache:	eng
Schlagworte:	Large language models Multilingualism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Aryabumi, Viraat Dang, John Talupuru, Dwarak Dash, Saurabh Cairuz, David Lin, Hangyu Venkitesh, Bharat Smith, Madeline Jon Ander Campos Tan, Yi Chern Marchisio, Kelly Bartolo, Max Ruder, Sebastian Locatelli, Acyr Kreutzer, Julia Frosst, Nick Gomez, Aidan Blunsom, Phil Fadaee, Marzieh Üstün, Ahmet Hooker, Sara
description	This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3063930719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063930719</sourcerecordid><originalsourceid>FETCH-proquest_journals_30639307193</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063930719</pqid></control><display><type>article</type><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><source>Free E- Journals</source><creator>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creator><creatorcontrib>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creatorcontrib><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Large language models ; Multilingualism</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><title>arXiv.org</title><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><subject>Large language models</subject><subject>Multilingualism</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>Aryabumi, Viraat</creator><creator>Dang, John</creator><creator>Talupuru, Dwarak</creator><creator>Dash, Saurabh</creator><creator>Cairuz, David</creator><creator>Lin, Hangyu</creator><creator>Venkitesh, Bharat</creator><creator>Smith, Madeline</creator><creator>Jon Ander Campos</creator><creator>Tan, Yi Chern</creator><creator>Marchisio, Kelly</creator><creator>Bartolo, Max</creator><creator>Ruder, Sebastian</creator><creator>Locatelli, Acyr</creator><creator>Kreutzer, Julia</creator><creator>Frosst, Nick</creator><creator>Gomez, Aidan</creator><creator>Blunsom, Phil</creator><creator>Fadaee, Marzieh</creator><creator>Üstün, Ahmet</creator><creator>Hooker, Sara</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240531</creationdate><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><author>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30639307193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Large language models</topic><topic>Multilingualism</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aryabumi, Viraat</au><au>Dang, John</au><au>Talupuru, Dwarak</au><au>Dash, Saurabh</au><au>Cairuz, David</au><au>Lin, Hangyu</au><au>Venkitesh, Bharat</au><au>Smith, Madeline</au><au>Jon Ander Campos</au><au>Tan, Yi Chern</au><au>Marchisio, Kelly</au><au>Bartolo, Max</au><au>Ruder, Sebastian</au><au>Locatelli, Acyr</au><au>Kreutzer, Julia</au><au>Frosst, Nick</au><au>Gomez, Aidan</au><au>Blunsom, Phil</au><au>Fadaee, Marzieh</au><au>Üstün, Ahmet</au><au>Hooker, Sara</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Aya 23: Open Weight Releases to Further Multilingual Progress</atitle><jtitle>arXiv.org</jtitle><date>2024-05-31</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3063930719
source	Free E- Journals
subjects	Large language models Multilingualism
title	Aya 23: Open Weight Releases to Further Multilingual Progress
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T22%3A33%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Aya%2023:%20Open%20Weight%20Releases%20to%20Further%20Multilingual%20Progress&rft.jtitle=arXiv.org&rft.au=Aryabumi,%20Viraat&rft.date=2024-05-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3063930719%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063930719&rft_id=info:pmid/&rfr_iscdi=true