The African Stopwords project: curating stopwords for African languages
Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resour...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stopwords are fundamental in Natural Language Processing (NLP) techniques for
information retrieval. One of the common tasks in preprocessing of text data is
the removal of stopwords. Currently, while high-resource languages like English
benefit from the availability of several stopwords, low-resource languages,
such as those found in the African continent, have none that are standardized
and available for use in NLP packages. Stopwords in the context of African
languages are understudied and can reveal information about the crossover
between languages. The \textit{African Stopwords} project aims to study and
curate stopwords for African languages. In this paper, we present our current
progress on ten African languages as well as future plans for the project. |
---|---|
DOI: | 10.48550/arxiv.2304.12155 |