A Survey of Resources and Methods for Natural Language Processing of Serbian Language
The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many wor...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Serbian language is a Slavic language spoken by over 12 million speakers
and well understood by over 15 million people. In the area of natural language
processing, it can be considered a low-resourced language. Also, Serbian is
considered a high-inflectional language. The combination of many word
inflections and low availability of language resources makes natural language
processing of Serbian challenging. Nevertheless, over the past three decades,
there have been a number of initiatives to develop resources and methods for
natural language processing of Serbian, ranging from developing a corpus of
free text from books and the internet, annotated corpora for classification and
named entity recognition tasks to various methods and models performing these
tasks. In this paper, we review the initiatives, resources, methods, and their
availability. |
---|---|
DOI: | 10.48550/arxiv.2304.05468 |