Bengali reduplication generation with finite-state transducers (FSTs)
Reduplication is a highly productive process in Bengali word formation, with significant implications for various natural language processing (NLP) applications, such as parts-of-speech tagging and sentiment analysis. Despite its importance, this area has not been extensively explored in computation...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2024, Vol.27 (3), p.729-737 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reduplication is a highly productive process in Bengali word formation, with significant implications for various natural language processing (NLP) applications, such as parts-of-speech tagging and sentiment analysis. Despite its importance, this area has not been extensively explored in computational linguistics, especially for low-resource languages like Bengali. This study first demonstrates that a two-way finite-state transducer (FST) can effectively capture complete reduplication generation processes in Bengali. Second, it is shown that the formation of partial reduplication requires a set of 2-way FSTs due to the diverse patterns involved in Bengali partial reduplications. Third, the research highlights the utility of the reduplication generation process in identifying Bengali reduplication instances, achieving a commendable F1-Score of 88.11%. This method outperforms current state-of-the-art methods for identifying reduplicated expressions in Bengali text. This research contributes valuable insights into the computational representation of reduplication in Bengali, offering potential enhancements for NLP tasks in low-resource language scenarios. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-024-10124-6 |