Aya 23: Open Weight Releases to Further Multilingual Progress

This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-05
Hauptverfasser: Aryabumi, Viraat, Dang, John, Talupuru, Dwarak, Dash, Saurabh, Cairuz, David, Lin, Hangyu, Venkitesh, Bharat, Smith, Madeline, Jon Ander Campos, Tan, Yi Chern, Marchisio, Kelly, Bartolo, Max, Ruder, Sebastian, Locatelli, Acyr, Kreutzer, Julia, Frosst, Nick, Gomez, Aidan, Blunsom, Phil, Fadaee, Marzieh, Üstün, Ahmet, Hooker, Sara
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Aryabumi, Viraat
Dang, John
Talupuru, Dwarak
Dash, Saurabh
Cairuz, David
Lin, Hangyu
Venkitesh, Bharat
Smith, Madeline
Jon Ander Campos
Tan, Yi Chern
Marchisio, Kelly
Bartolo, Max
Ruder, Sebastian
Locatelli, Acyr
Kreutzer, Julia
Frosst, Nick
Gomez, Aidan
Blunsom, Phil
Fadaee, Marzieh
Üstün, Ahmet
Hooker, Sara
description This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3063930719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063930719</sourcerecordid><originalsourceid>FETCH-proquest_journals_30639307193</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063930719</pqid></control><display><type>article</type><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><source>Free E- Journals</source><creator>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creator><creatorcontrib>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creatorcontrib><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Large language models ; Multilingualism</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><title>arXiv.org</title><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><subject>Large language models</subject><subject>Multilingualism</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>Aryabumi, Viraat</creator><creator>Dang, John</creator><creator>Talupuru, Dwarak</creator><creator>Dash, Saurabh</creator><creator>Cairuz, David</creator><creator>Lin, Hangyu</creator><creator>Venkitesh, Bharat</creator><creator>Smith, Madeline</creator><creator>Jon Ander Campos</creator><creator>Tan, Yi Chern</creator><creator>Marchisio, Kelly</creator><creator>Bartolo, Max</creator><creator>Ruder, Sebastian</creator><creator>Locatelli, Acyr</creator><creator>Kreutzer, Julia</creator><creator>Frosst, Nick</creator><creator>Gomez, Aidan</creator><creator>Blunsom, Phil</creator><creator>Fadaee, Marzieh</creator><creator>Üstün, Ahmet</creator><creator>Hooker, Sara</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240531</creationdate><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><author>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30639307193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Large language models</topic><topic>Multilingualism</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aryabumi, Viraat</au><au>Dang, John</au><au>Talupuru, Dwarak</au><au>Dash, Saurabh</au><au>Cairuz, David</au><au>Lin, Hangyu</au><au>Venkitesh, Bharat</au><au>Smith, Madeline</au><au>Jon Ander Campos</au><au>Tan, Yi Chern</au><au>Marchisio, Kelly</au><au>Bartolo, Max</au><au>Ruder, Sebastian</au><au>Locatelli, Acyr</au><au>Kreutzer, Julia</au><au>Frosst, Nick</au><au>Gomez, Aidan</au><au>Blunsom, Phil</au><au>Fadaee, Marzieh</au><au>Üstün, Ahmet</au><au>Hooker, Sara</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Aya 23: Open Weight Releases to Further Multilingual Progress</atitle><jtitle>arXiv.org</jtitle><date>2024-05-31</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_3063930719
source Free E- Journals
subjects Large language models
Multilingualism
title Aya 23: Open Weight Releases to Further Multilingual Progress
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T22%3A33%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Aya%2023:%20Open%20Weight%20Releases%20to%20Further%20Multilingual%20Progress&rft.jtitle=arXiv.org&rft.au=Aryabumi,%20Viraat&rft.date=2024-05-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3063930719%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063930719&rft_id=info:pmid/&rfr_iscdi=true