LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-11 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Carta, Salvatore Mario Chessa, Stefano Contu, Giulia Corriga, Andrea Deidda, Andrea Fenu, Gianni Frigau, Luca Giuliani, Alessandro Grassi, Luca Manca, Marco Manolo Marras, Mirko Mola, Francesco Mossa, Bastianino Mura, Piergiorgio Ortu, Marco Piano, Leonardo Pisano, Simone Pisu, Alessia Podda, Alessandro Sebastian Pompianu, Livio Seu, Simone Sandro Gabriele Tiddia |
description | Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3131612119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3131612119</sourcerecordid><originalsourceid>FETCH-proquest_journals_31316121193</originalsourceid><addsrcrecordid>eNqNjN1qwkAQhZeCoFTfYaDXgexu_eudlaYtRCoq3sqikzQ27qQz-QGf3kB8AK8OfOc750kNjLU6mL0a01cjkXMYhmYyNeOxHagm_l69L95g4eGnQB9sqeIjQsTugg3xHyTEUP4irBkFuXZlRh6cP8He5cTZtQOUQExNsEHp9rHzaeVSFKgk8yl8okdu1RphRSfMZah6icsFR_d8Vi_Rx275FRRM_xVKeTi3T76tDlZbPdFG67l9zLoBXWtMhg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3131612119</pqid></control><display><type>article</type><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><source>Free E- Journals</source><creator>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</creator><creatorcontrib>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</creatorcontrib><description>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial intelligence ; Cultural resources ; Languages ; Linguistics</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Carta, Salvatore Mario</creatorcontrib><creatorcontrib>Chessa, Stefano</creatorcontrib><creatorcontrib>Contu, Giulia</creatorcontrib><creatorcontrib>Corriga, Andrea</creatorcontrib><creatorcontrib>Deidda, Andrea</creatorcontrib><creatorcontrib>Fenu, Gianni</creatorcontrib><creatorcontrib>Frigau, Luca</creatorcontrib><creatorcontrib>Giuliani, Alessandro</creatorcontrib><creatorcontrib>Grassi, Luca</creatorcontrib><creatorcontrib>Manca, Marco Manolo</creatorcontrib><creatorcontrib>Marras, Mirko</creatorcontrib><creatorcontrib>Mola, Francesco</creatorcontrib><creatorcontrib>Mossa, Bastianino</creatorcontrib><creatorcontrib>Mura, Piergiorgio</creatorcontrib><creatorcontrib>Ortu, Marco</creatorcontrib><creatorcontrib>Piano, Leonardo</creatorcontrib><creatorcontrib>Pisano, Simone</creatorcontrib><creatorcontrib>Pisu, Alessia</creatorcontrib><creatorcontrib>Podda, Alessandro Sebastian</creatorcontrib><creatorcontrib>Pompianu, Livio</creatorcontrib><creatorcontrib>Seu, Simone</creatorcontrib><creatorcontrib>Sandro Gabriele Tiddia</creatorcontrib><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><title>arXiv.org</title><description>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</description><subject>Artificial intelligence</subject><subject>Cultural resources</subject><subject>Languages</subject><subject>Linguistics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjN1qwkAQhZeCoFTfYaDXgexu_eudlaYtRCoq3sqikzQ27qQz-QGf3kB8AK8OfOc750kNjLU6mL0a01cjkXMYhmYyNeOxHagm_l69L95g4eGnQB9sqeIjQsTugg3xHyTEUP4irBkFuXZlRh6cP8He5cTZtQOUQExNsEHp9rHzaeVSFKgk8yl8okdu1RphRSfMZah6icsFR_d8Vi_Rx275FRRM_xVKeTi3T76tDlZbPdFG67l9zLoBXWtMhg</recordid><startdate>20241120</startdate><enddate>20241120</enddate><creator>Carta, Salvatore Mario</creator><creator>Chessa, Stefano</creator><creator>Contu, Giulia</creator><creator>Corriga, Andrea</creator><creator>Deidda, Andrea</creator><creator>Fenu, Gianni</creator><creator>Frigau, Luca</creator><creator>Giuliani, Alessandro</creator><creator>Grassi, Luca</creator><creator>Manca, Marco Manolo</creator><creator>Marras, Mirko</creator><creator>Mola, Francesco</creator><creator>Mossa, Bastianino</creator><creator>Mura, Piergiorgio</creator><creator>Ortu, Marco</creator><creator>Piano, Leonardo</creator><creator>Pisano, Simone</creator><creator>Pisu, Alessia</creator><creator>Podda, Alessandro Sebastian</creator><creator>Pompianu, Livio</creator><creator>Seu, Simone</creator><creator>Sandro Gabriele Tiddia</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241120</creationdate><title>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</title><author>Carta, Salvatore Mario ; Chessa, Stefano ; Contu, Giulia ; Corriga, Andrea ; Deidda, Andrea ; Fenu, Gianni ; Frigau, Luca ; Giuliani, Alessandro ; Grassi, Luca ; Manca, Marco Manolo ; Marras, Mirko ; Mola, Francesco ; Mossa, Bastianino ; Mura, Piergiorgio ; Ortu, Marco ; Piano, Leonardo ; Pisano, Simone ; Pisu, Alessia ; Podda, Alessandro Sebastian ; Pompianu, Livio ; Seu, Simone ; Sandro Gabriele Tiddia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31316121193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial intelligence</topic><topic>Cultural resources</topic><topic>Languages</topic><topic>Linguistics</topic><toplevel>online_resources</toplevel><creatorcontrib>Carta, Salvatore Mario</creatorcontrib><creatorcontrib>Chessa, Stefano</creatorcontrib><creatorcontrib>Contu, Giulia</creatorcontrib><creatorcontrib>Corriga, Andrea</creatorcontrib><creatorcontrib>Deidda, Andrea</creatorcontrib><creatorcontrib>Fenu, Gianni</creatorcontrib><creatorcontrib>Frigau, Luca</creatorcontrib><creatorcontrib>Giuliani, Alessandro</creatorcontrib><creatorcontrib>Grassi, Luca</creatorcontrib><creatorcontrib>Manca, Marco Manolo</creatorcontrib><creatorcontrib>Marras, Mirko</creatorcontrib><creatorcontrib>Mola, Francesco</creatorcontrib><creatorcontrib>Mossa, Bastianino</creatorcontrib><creatorcontrib>Mura, Piergiorgio</creatorcontrib><creatorcontrib>Ortu, Marco</creatorcontrib><creatorcontrib>Piano, Leonardo</creatorcontrib><creatorcontrib>Pisano, Simone</creatorcontrib><creatorcontrib>Pisu, Alessia</creatorcontrib><creatorcontrib>Podda, Alessandro Sebastian</creatorcontrib><creatorcontrib>Pompianu, Livio</creatorcontrib><creatorcontrib>Seu, Simone</creatorcontrib><creatorcontrib>Sandro Gabriele Tiddia</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Carta, Salvatore Mario</au><au>Chessa, Stefano</au><au>Contu, Giulia</au><au>Corriga, Andrea</au><au>Deidda, Andrea</au><au>Fenu, Gianni</au><au>Frigau, Luca</au><au>Giuliani, Alessandro</au><au>Grassi, Luca</au><au>Manca, Marco Manolo</au><au>Marras, Mirko</au><au>Mola, Francesco</au><au>Mossa, Bastianino</au><au>Mura, Piergiorgio</au><au>Ortu, Marco</au><au>Piano, Leonardo</au><au>Pisano, Simone</au><au>Pisu, Alessia</au><au>Podda, Alessandro Sebastian</au><au>Pompianu, Livio</au><au>Seu, Simone</au><au>Sandro Gabriele Tiddia</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models</atitle><jtitle>arXiv.org</jtitle><date>2024-11-20</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3131612119 |
source | Free E- Journals |
subjects | Artificial intelligence Cultural resources Languages Linguistics |
title | LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T05%3A31%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=LIMBA:%20An%20Open-Source%20Framework%20for%20the%20Preservation%20and%20Valorization%20of%20Low-Resource%20Languages%20using%20Generative%20Models&rft.jtitle=arXiv.org&rft.au=Carta,%20Salvatore%20Mario&rft.date=2024-11-20&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3131612119%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3131612119&rft_id=info:pmid/&rfr_iscdi=true |