Aya 23: Open Weight Releases to Further Multilingual Progress
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The re...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-05 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Aryabumi, Viraat Dang, John Talupuru, Dwarak Dash, Saurabh Cairuz, David Lin, Hangyu Venkitesh, Bharat Smith, Madeline Jon Ander Campos Tan, Yi Chern Marchisio, Kelly Bartolo, Max Ruder, Sebastian Locatelli, Acyr Kreutzer, Julia Frosst, Nick Gomez, Aidan Blunsom, Phil Fadaee, Marzieh Üstün, Ahmet Hooker, Sara |
description | This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3063930719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063930719</sourcerecordid><originalsourceid>FETCH-proquest_journals_30639307193</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063930719</pqid></control><display><type>article</type><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><source>Free E- Journals</source><creator>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creator><creatorcontrib>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</creatorcontrib><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Large language models ; Multilingualism</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><title>arXiv.org</title><description>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</description><subject>Large language models</subject><subject>Multilingualism</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtO1xwLqS5trWCg4jFRZQiOJYM1_4QmpqbDL69Dj6A0xm-MxORQkyT7UaphYiZBymlyguVZRiJ_eGtQeEOrhON8KC-7TzUZEgzMXgLVXC-IweXYHxv-rEN2sDN2dYR80rMn9owxb8uxbo63Y_nZHL2FYh9M9jgxi81KHMsURZpif9dH-cnNws</recordid><startdate>20240531</startdate><enddate>20240531</enddate><creator>Aryabumi, Viraat</creator><creator>Dang, John</creator><creator>Talupuru, Dwarak</creator><creator>Dash, Saurabh</creator><creator>Cairuz, David</creator><creator>Lin, Hangyu</creator><creator>Venkitesh, Bharat</creator><creator>Smith, Madeline</creator><creator>Jon Ander Campos</creator><creator>Tan, Yi Chern</creator><creator>Marchisio, Kelly</creator><creator>Bartolo, Max</creator><creator>Ruder, Sebastian</creator><creator>Locatelli, Acyr</creator><creator>Kreutzer, Julia</creator><creator>Frosst, Nick</creator><creator>Gomez, Aidan</creator><creator>Blunsom, Phil</creator><creator>Fadaee, Marzieh</creator><creator>Üstün, Ahmet</creator><creator>Hooker, Sara</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240531</creationdate><title>Aya 23: Open Weight Releases to Further Multilingual Progress</title><author>Aryabumi, Viraat ; Dang, John ; Talupuru, Dwarak ; Dash, Saurabh ; Cairuz, David ; Lin, Hangyu ; Venkitesh, Bharat ; Smith, Madeline ; Jon Ander Campos ; Tan, Yi Chern ; Marchisio, Kelly ; Bartolo, Max ; Ruder, Sebastian ; Locatelli, Acyr ; Kreutzer, Julia ; Frosst, Nick ; Gomez, Aidan ; Blunsom, Phil ; Fadaee, Marzieh ; Üstün, Ahmet ; Hooker, Sara</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30639307193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Large language models</topic><topic>Multilingualism</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryabumi, Viraat</creatorcontrib><creatorcontrib>Dang, John</creatorcontrib><creatorcontrib>Talupuru, Dwarak</creatorcontrib><creatorcontrib>Dash, Saurabh</creatorcontrib><creatorcontrib>Cairuz, David</creatorcontrib><creatorcontrib>Lin, Hangyu</creatorcontrib><creatorcontrib>Venkitesh, Bharat</creatorcontrib><creatorcontrib>Smith, Madeline</creatorcontrib><creatorcontrib>Jon Ander Campos</creatorcontrib><creatorcontrib>Tan, Yi Chern</creatorcontrib><creatorcontrib>Marchisio, Kelly</creatorcontrib><creatorcontrib>Bartolo, Max</creatorcontrib><creatorcontrib>Ruder, Sebastian</creatorcontrib><creatorcontrib>Locatelli, Acyr</creatorcontrib><creatorcontrib>Kreutzer, Julia</creatorcontrib><creatorcontrib>Frosst, Nick</creatorcontrib><creatorcontrib>Gomez, Aidan</creatorcontrib><creatorcontrib>Blunsom, Phil</creatorcontrib><creatorcontrib>Fadaee, Marzieh</creatorcontrib><creatorcontrib>Üstün, Ahmet</creatorcontrib><creatorcontrib>Hooker, Sara</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aryabumi, Viraat</au><au>Dang, John</au><au>Talupuru, Dwarak</au><au>Dash, Saurabh</au><au>Cairuz, David</au><au>Lin, Hangyu</au><au>Venkitesh, Bharat</au><au>Smith, Madeline</au><au>Jon Ander Campos</au><au>Tan, Yi Chern</au><au>Marchisio, Kelly</au><au>Bartolo, Max</au><au>Ruder, Sebastian</au><au>Locatelli, Acyr</au><au>Kreutzer, Julia</au><au>Frosst, Nick</au><au>Gomez, Aidan</au><au>Blunsom, Phil</au><au>Fadaee, Marzieh</au><au>Üstün, Ahmet</au><au>Hooker, Sara</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Aya 23: Open Weight Releases to Further Multilingual Progress</atitle><jtitle>arXiv.org</jtitle><date>2024-05-31</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-05 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3063930719 |
source | Free E- Journals |
subjects | Large language models Multilingualism |
title | Aya 23: Open Weight Releases to Further Multilingual Progress |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T22%3A33%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Aya%2023:%20Open%20Weight%20Releases%20to%20Further%20Multilingual%20Progress&rft.jtitle=arXiv.org&rft.au=Aryabumi,%20Viraat&rft.date=2024-05-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3063930719%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063930719&rft_id=info:pmid/&rfr_iscdi=true |