Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach

Stemming algorithms are crucial tools for enhancing the information retrieval process in natural language processing. This paper presents a novel Arabic light stemming algorithm called Tashaphyne0.4, the idea behind this algorithm is to extract the most precise ‘ roots ’, and ‘ stems ’ from words of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information retrieval (Boston) 2023-12, Vol.26 (1-2), p.14, Article 14
Hauptverfasser: Al-Khatib, Ra’ed M., Zerrouki, Taha, Abu Shquier, Mohammed M., Balla, Amar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Stemming algorithms are crucial tools for enhancing the information retrieval process in natural language processing. This paper presents a novel Arabic light stemming algorithm called Tashaphyne0.4, the idea behind this algorithm is to extract the most precise ‘ roots ’, and ‘ stems ’ from words of an Arabic text. Thus, the proposed algorithm acts as rooter, stemmer, and segmentation tools at the same time. Our approach involves tri-fold phases (i.e., Preparation, Stems-Extractor, and Root-Extractor). Tashaphyne0.4 has shown better results than six other stemmers (i.e., Khoja, ISRI, Motaz/Light10, Tashaphyne0.3, FARASA, and Assem stemmers). The comparison is performed using four different Arabic comprehensive-benchmarks datasets. In conclusion, our proposed stemmer achieved remarkable results and outperformed other competitive stemmers in extracting ‘ Roots ’ and ‘ Stems ’.
ISSN:1386-4564
1573-7659
DOI:10.1007/s10791-023-09429-y