Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish

In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Philippy, Fred, Haddadan, Shohreh, Guo, Siwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Philippy, Fred
Haddadan, Shohreh
Guo, Siwen
description In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.
doi_str_mv 10.48550/arxiv.2404.03912
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_03912</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_03912</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-66671390ba7bff8956c2b1b82dfdabde661883f1cb590075282c19cadf10ac023</originalsourceid><addsrcrecordid>eNo1kEFPgzAYhrl4MNMf4MnvBwi2BQp4W9DpEqKJ4sUL-VpaaMJW0oLb_r3b1NN7efImzxMEN5RESZ6m5B7d3nxHLCFJROKCssvgsLKuUxO8Vus7-PQKEB6NnIzdojs8wJdyNvzo7QS1HY2EckDvjTYSTwho66Cyu_BdeTs7qaDCbTdjpzzszNTDchyHf3ayUM17tRFHsjO-vwouNA5eXf_tIqhXT3X5ElZvz-tyWYXIMxZyzjMaF0RgJrTOi5RLJqjIWatbFK3inOZ5rKkUaUFIlrKcSVpIbDUlKAmLF8Ht7-3ZvRmd2RzNmlOD5twg_gG6aVhK</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish</title><source>arXiv.org</source><creator>Philippy, Fred ; Haddadan, Shohreh ; Guo, Siwen</creator><creatorcontrib>Philippy, Fred ; Haddadan, Shohreh ; Guo, Siwen</creatorcontrib><description>In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.</description><identifier>DOI: 10.48550/arxiv.2404.03912</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.03912$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.03912$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Philippy, Fred</creatorcontrib><creatorcontrib>Haddadan, Shohreh</creatorcontrib><creatorcontrib>Guo, Siwen</creatorcontrib><title>Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish</title><description>In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1kEFPgzAYhrl4MNMf4MnvBwi2BQp4W9DpEqKJ4sUL-VpaaMJW0oLb_r3b1NN7efImzxMEN5RESZ6m5B7d3nxHLCFJROKCssvgsLKuUxO8Vus7-PQKEB6NnIzdojs8wJdyNvzo7QS1HY2EckDvjTYSTwho66Cyu_BdeTs7qaDCbTdjpzzszNTDchyHf3ayUM17tRFHsjO-vwouNA5eXf_tIqhXT3X5ElZvz-tyWYXIMxZyzjMaF0RgJrTOi5RLJqjIWatbFK3inOZ5rKkUaUFIlrKcSVpIbDUlKAmLF8Ht7-3ZvRmd2RzNmlOD5twg_gG6aVhK</recordid><startdate>20240405</startdate><enddate>20240405</enddate><creator>Philippy, Fred</creator><creator>Haddadan, Shohreh</creator><creator>Guo, Siwen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240405</creationdate><title>Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish</title><author>Philippy, Fred ; Haddadan, Shohreh ; Guo, Siwen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-66671390ba7bff8956c2b1b82dfdabde661883f1cb590075282c19cadf10ac023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Philippy, Fred</creatorcontrib><creatorcontrib>Haddadan, Shohreh</creatorcontrib><creatorcontrib>Guo, Siwen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Philippy, Fred</au><au>Haddadan, Shohreh</au><au>Guo, Siwen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish</atitle><date>2024-04-05</date><risdate>2024</risdate><abstract>In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels. However, this approach faces certain challenges, particularly for languages with limited resources. In this paper, we propose an alternative solution that leverages dictionaries as a source of data for ZSC. We focus on Luxembourgish, a low-resource language spoken in Luxembourg, and construct two new topic relevance classification datasets based on a dictionary that provides various synonyms, word translations and example sentences. We evaluate the usability of our dataset and compare it with the NLI-based approach on two topic classification tasks in a zero-shot manner. Our results show that by using the dictionary-based dataset, the trained models outperform the ones following the NLI-based approach for ZSC. While we focus on a single low-resource language in this study, we believe that the efficacy of our approach can also transfer to other languages where such a dictionary is available.</abstract><doi>10.48550/arxiv.2404.03912</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.03912
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_03912
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T02%3A45%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Forget%20NLI,%20Use%20a%20Dictionary:%20Zero-Shot%20Topic%20Classification%20for%20Low-Resource%20Languages%20with%20Application%20to%20Luxembourgish&rft.au=Philippy,%20Fred&rft.date=2024-04-05&rft_id=info:doi/10.48550/arxiv.2404.03912&rft_dat=%3Carxiv_GOX%3E2404_03912%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true