Bengali reduplication generation with finite-state transducers (FSTs)

Reduplication is a highly productive process in Bengali word formation, with significant implications for various natural language processing (NLP) applications, such as parts-of-speech tagging and sentiment analysis. Despite its importance, this area has not been extensively explored in computation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of speech technology 2024, Vol.27 (3), p.729-737
Hauptverfasser:	Barman, Abhijit, Saha, Diganta, Pal, Alok Ranjan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Bengali Computational linguistics Data mining Engineering Finite state automata Linguistics Natural language processing Reduplication Sentiment analysis Signal,Image and Speech Processing Social Sciences Tagging (Computational linguistics) Transducers Word formation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reduplication is a highly productive process in Bengali word formation, with significant implications for various natural language processing (NLP) applications, such as parts-of-speech tagging and sentiment analysis. Despite its importance, this area has not been extensively explored in computational linguistics, especially for low-resource languages like Bengali. This study first demonstrates that a two-way finite-state transducer (FST) can effectively capture complete reduplication generation processes in Bengali. Second, it is shown that the formation of partial reduplication requires a set of 2-way FSTs due to the diverse patterns involved in Bengali partial reduplications. Third, the research highlights the utility of the reduplication generation process in identifying Bengali reduplication instances, achieving a commendable F1-Score of 88.11%. This method outperforms current state-of-the-art methods for identifying reduplicated expressions in Bengali text. This research contributes valuable insights into the computational representation of reduplication in Bengali, offering potential enhancements for NLP tasks in low-resource language scenarios.
ISSN:	1381-2416 1572-8110
DOI:	10.1007/s10772-024-10124-6