Parts-of-speech tagging of Nepali texts with Bidirectional LSTM, Conditional Random Fields and HMM
Parts-of-Speech (POS) Tagging is one of the fundamental and pre-processing steps for Natural Language Processing (NLP) tasks such as Text Summarization, Name Entity Recognition, Dependency Parsing or Parsing in general, Classification, Sentiment analysis, Machine translation and Information Extracti...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024, Vol.83 (4), p.9893-9909 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Parts-of-Speech (POS) Tagging is one of the fundamental and pre-processing steps for Natural Language Processing (NLP) tasks such as Text Summarization, Name Entity Recognition, Dependency Parsing or Parsing in general, Classification, Sentiment analysis, Machine translation and Information Extraction systems etc. Various state-of-art models have been implemented for the POS tagging of many natural languages. However from our literature survey, it is established that the problem has not been addressed rigorously for Nepali language and no comprehensive comparative studies have been presented. It is an under-resourced and highly inflectional language, therefore encodes information like gender, person, number, mood, and aspect within their word forms. Precise disambiguation of these inflected words is critical in Nepali text analysis. In this paper, POS tagging using Hidden Markov Model (HMM), Conditional Random Fields (CRF) and Long Short Term Memory (LSTM) is presented for the language. Furthermore, a comprehensive comparative study of the three models is also presented. Experiments shows that CRF based technique outperforms HMM model, further deep neural network based technique like LSTM outperforms CRF in terms of accuracy, which scores an accuracy of
99.6
%
. This study demonstrate that deep learning based models are exceptional at disambiguating rich morphological information encoded by Nepali words. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-15679-1 |