Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp

Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2023, Vol.14 (2)
Hauptverfasser:	Biol, Ivic Jan A., Depositario, Rhey Marc A., Noangay, Glenn Geo T., Melchor, Julian Michael F., Abalorio, Cristopher C., Bustillo, James Cloyd M.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classification Classifiers Documents Information retrieval Machine learning Natural language processing Optical character recognition Text categorization Weighting
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists, take extra effort and time in classifying research documents. To solve this problem, the researchers used state-of-the-art supervised term weighting schemes, namely: TF-MONO and SQRTF-MONO and its application to machine learning algorithms: K-Nearest Neighbor, Linear Support Vector, Naive Bayes Classifiers, creating a total of six classifier models to ascertain which of them performs optimally in classifying research documents while utilizing Optical Character Recognition for text extraction. The results showed that among all classification models trained, SQRTF-MONO and Linear SVC outperformed all other models with an F1 score of 0.94 both in the abstract and the background of the study datasets. In conclusion, the developed classification model and application prototype can be a tool to help researchers, advisers, and panelists to lessen the time spent in classifying research documents.
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2023.0140240