Automatic Detection of the Boundary between Metadata and Body in Persian Theses using BA_SVM

Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pizhūhishnāmah-i pardāzish va mudiriyyat-i iṭṭilāʻāt (Online) 2021-07, Vol.36 (4), p.1159-1179
Hauptverfasser: Mohadese Rahnama, Seyed Mohammad Hossein Hasheminejad, Jalal A Nasiri
Format: Artikel
Sprache:per
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Metadata extraction facilitates the process of indexing and improves information retrieval. Also automation of this process increases efficiency more than manual extraction. The example of the thesis metadata are names of students, professors, title, field, degree, abstract, keywords, etc. In this paper the aim is automatic boundary detection of metadata from the main body in Persian theses. Therefore, 250 theses collected from IRANDOC system. Features were extracted from paragraphs of each thesis then paragraphs were classified using support vector machine into 2 classes: metadata and body. In this study, Bat algorithm is used to set the parameter of SVM. The result reveals that the proposed method predicts type of paragraphs with 96.6 percent accuracy.
ISSN:2251-8223
2251-8231