AR-Sanad 280K: A Novel 280K Artificial Sanads Dataset for Hadith Narrator Disambiguation

Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information (Basel) 2022-02, Vol.13 (2), p.55
Hauptverfasser:	Mahmoud, Somaia, Saif, Omar, Nabil, Emad, Abdeen, Mohammad, ElNainay, Mustafa, Torki, Marwan
Format:	Artikel
Sprache:	eng
Schlagworte:	AraBERT Arabic dataset Arabic literature Authenticity Biographies Chains Datasets Deep learning Islamic hadith Machine learning Muhammad (prophet) Muslims Names narrator disambiguation natural language processing Phraseologisms Reliability aspects Religion Religious literature Word sense disambiguation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators’ full names are not usually included in the narration chains of hadiths. Thus, first, ambiguous narrators need to be identified. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains (sanads) with identified narrators. The AR-Sanad 280K dataset has around 280K artificial sanads and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classification problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by finetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artificial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate.
ISSN:	2078-2489 2078-2489
DOI:	10.3390/info13020055