LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models

Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Carta, Salvatore Mario, Chessa, Stefano, Contu, Giulia, Corriga, Andrea, Deidda, Andrea, Fenu, Gianni, Frigau, Luca, Giuliani, Alessandro, Grassi, Luca, Manca, Marco Manolo, Marras, Mirko, Mola, Francesco, Mossa, Bastianino, Mura, Piergiorgio, Ortu, Marco, Piano, Leonardo, Pisano, Simone, Pisu, Alessia, Podda, Alessandro Sebastian, Pompianu, Livio, Seu, Simone, Sandro Gabriele Tiddia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Carta, Salvatore Mario
Chessa, Stefano
Contu, Giulia
Corriga, Andrea
Deidda, Andrea
Fenu, Gianni
Frigau, Luca
Giuliani, Alessandro
Grassi, Luca
Manca, Marco Manolo
Marras, Mirko
Mola, Francesco
Mossa, Bastianino
Mura, Piergiorgio
Ortu, Marco
Piano, Leonardo
Pisano, Simone
Pisu, Alessia
Podda, Alessandro Sebastian
Pompianu, Livio
Seu, Simone
Sandro Gabriele Tiddia
description Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3131612119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3131612119</sourcerecordid><originalsourceid>FETCH-proquest_journals_31316121193</originalsourceid><addsrcrecordid>eNqNjN1qwkAQhZeCoFTfYaDXgexu_eudlaYtRCoq3sqikzQ27qQz-QGf3kB8AK8OfOc750kNjLU6mL0a01cjkXMYhmYyNeOxHagm_l69L95g4eGnQB9sqeIjQsTugg3xHyTEUP4irBkFuXZlRh6cP8He5cTZtQOUQExNsEHp9rHzaeVSFKgk8yl8okdu1RphRSfMZah6icsFR_d8Vi_Rx275FRRM_xVKeTi3T76tDlZbPdFG67l9zLoBXWtMhg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3131612119</pqid></control><display><type>article</type><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><source>Free E- Journals</source><creator>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</creator><creatorcontrib>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</creatorcontrib><description>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial intelligence ; Cultural resources ; Languages ; Linguistics</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Carta, Salvatore Mario</creatorcontrib><creatorcontrib>Chessa, Stefano</creatorcontrib><creatorcontrib>Contu, Giulia</creatorcontrib><creatorcontrib>Corriga, Andrea</creatorcontrib><creatorcontrib>Deidda, Andrea</creatorcontrib><creatorcontrib>Fenu, Gianni</creatorcontrib><creatorcontrib>Frigau, Luca</creatorcontrib><creatorcontrib>Giuliani, Alessandro</creatorcontrib><creatorcontrib>Grassi, Luca</creatorcontrib><creatorcontrib>Manca, Marco Manolo</creatorcontrib><creatorcontrib>Marras, Mirko</creatorcontrib><creatorcontrib>Mola, Francesco</creatorcontrib><creatorcontrib>Mossa, Bastianino</creatorcontrib><creatorcontrib>Mura, Piergiorgio</creatorcontrib><creatorcontrib>Ortu, Marco</creatorcontrib><creatorcontrib>Piano, Leonardo</creatorcontrib><creatorcontrib>Pisano, Simone</creatorcontrib><creatorcontrib>Pisu, Alessia</creatorcontrib><creatorcontrib>Podda, Alessandro Sebastian</creatorcontrib><creatorcontrib>Pompianu, Livio</creatorcontrib><creatorcontrib>Seu, Simone</creatorcontrib><creatorcontrib>Sandro Gabriele Tiddia</creatorcontrib><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><title>arXiv.org</title><description>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</description><subject>Artificial intelligence</subject><subject>Cultural resources</subject><subject>Languages</subject><subject>Linguistics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjN1qwkAQhZeCoFTfYaDXgexu_eudlaYtRCoq3sqikzQ27qQz-QGf3kB8AK8OfOc750kNjLU6mL0a01cjkXMYhmYyNeOxHagm_l69L95g4eGnQB9sqeIjQsTugg3xHyTEUP4irBkFuXZlRh6cP8He5cTZtQOUQExNsEHp9rHzaeVSFKgk8yl8okdu1RphRSfMZah6icsFR_d8Vi_Rx275FRRM_xVKeTi3T76tDlZbPdFG67l9zLoBXWtMhg</recordid><startdate>20241120</startdate><enddate>20241120</enddate><creator>Carta, Salvatore Mario</creator><creator>Chessa, Stefano</creator><creator>Contu, Giulia</creator><creator>Corriga, Andrea</creator><creator>Deidda, Andrea</creator><creator>Fenu, Gianni</creator><creator>Frigau, Luca</creator><creator>Giuliani, Alessandro</creator><creator>Grassi, Luca</creator><creator>Manca, Marco Manolo</creator><creator>Marras, Mirko</creator><creator>Mola, Francesco</creator><creator>Mossa, Bastianino</creator><creator>Mura, Piergiorgio</creator><creator>Ortu, Marco</creator><creator>Piano, Leonardo</creator><creator>Pisano, Simone</creator><creator>Pisu, Alessia</creator><creator>Podda, Alessandro Sebastian</creator><creator>Pompianu, Livio</creator><creator>Seu, Simone</creator><creator>Sandro Gabriele Tiddia</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241120</creationdate><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><author>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31316121193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Cultural resources</topic><topic>Languages</topic><topic>Linguistics</topic><toplevel>online_resources</toplevel><creatorcontrib>Carta, Salvatore Mario</creatorcontrib><creatorcontrib>Chessa, Stefano</creatorcontrib><creatorcontrib>Contu, Giulia</creatorcontrib><creatorcontrib>Corriga, Andrea</creatorcontrib><creatorcontrib>Deidda, Andrea</creatorcontrib><creatorcontrib>Fenu, Gianni</creatorcontrib><creatorcontrib>Frigau, Luca</creatorcontrib><creatorcontrib>Giuliani, Alessandro</creatorcontrib><creatorcontrib>Grassi, Luca</creatorcontrib><creatorcontrib>Manca, Marco Manolo</creatorcontrib><creatorcontrib>Marras, Mirko</creatorcontrib><creatorcontrib>Mola, Francesco</creatorcontrib><creatorcontrib>Mossa, Bastianino</creatorcontrib><creatorcontrib>Mura, Piergiorgio</creatorcontrib><creatorcontrib>Ortu, Marco</creatorcontrib><creatorcontrib>Piano, Leonardo</creatorcontrib><creatorcontrib>Pisano, Simone</creatorcontrib><creatorcontrib>Pisu, Alessia</creatorcontrib><creatorcontrib>Podda, Alessandro Sebastian</creatorcontrib><creatorcontrib>Pompianu, Livio</creatorcontrib><creatorcontrib>Seu, Simone</creatorcontrib><creatorcontrib>Sandro Gabriele Tiddia</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Carta, Salvatore Mario</au><au>Chessa, Stefano</au><au>Contu, Giulia</au><au>Corriga, Andrea</au><au>Deidda, Andrea</au><au>Fenu, Gianni</au><au>Frigau, Luca</au><au>Giuliani, Alessandro</au><au>Grassi, Luca</au><au>Manca, Marco Manolo</au><au>Marras, Mirko</au><au>Mola, Francesco</au><au>Mossa, Bastianino</au><au>Mura, Piergiorgio</au><au>Ortu, Marco</au><au>Piano, Leonardo</au><au>Pisano, Simone</au><au>Pisu, Alessia</au><au>Podda, Alessandro Sebastian</au><au>Pompianu, Livio</au><au>Seu, Simone</au><au>Sandro Gabriele Tiddia</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</atitle><jtitle>arXiv.org</jtitle><date>2024-11-20</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3131612119
source Free E- Journals
subjects Artificial intelligence
Cultural resources
Languages
Linguistics
title LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T05%3A31%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=LIMBA:%20An%20Open-Source%20Framework%20for%20the%20Preservation%20and%20Valorization%20of%20Low-Resource%20Languages%20using%20Generative%20Models&rft.jtitle=arXiv.org&rft.au=Carta,%20Salvatore%20Mario&rft.date=2024-11-20&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3131612119%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3131612119&rft_id=info:pmid/&rfr_iscdi=true