Arabic text clustering technique to improve information retrieval

Arabic language has its own characteristics which are different than other languages. It is concatenative language, so there are many problems in processing of Arabic language. The work in this research project represents an attempt to solve the problems facing Arabic documents users by proposing a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aliwy, Ahmed H., Aljanabi, Kadhim B. S., Alameen, Huda A.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Arabic language has its own characteristics which are different than other languages. It is concatenative language, so there are many problems in processing of Arabic language. The work in this research project represents an attempt to solve the problems facing Arabic documents users by proposing a new approach where the retrieved documents are classified into limited number of groups (clusters) that may help the users in finding out the relevant documents efficiently and effectively. First the data was collected and pre-processed, and then a complete IR system has been implemented that included all IR processing, which in turn has been improved by clustering techniques for the user point of view where the retrieved documents were grouped into k clusters. Three clustering techniques (k-means as a type of flat clustering, Ward’s and Average agglomerative as a types of Hierarchal clustering) were applied. Two types of stemmer (Heavy and Light stemmer) were applied in pre-processing phase. The results obtained from applying the proposed approach on one thousand Arabic text documents (Collected from “Alsabaah” Newspaper), showed that the users can be conducted in a way the returns relevant documents with better accuracy and performance. Different techniques and algorithms were implemented using C# and Python programming languages.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0066837