A machine learning modeling prediction of enrollment among admitted college applicants at University of Santo Tomas
Predicting the enrollment has become a critical part of institutional planning processes in higher education. The forecast of enrolled number of students annually, represents a very important role, because the foundation of the budget and expenditures is based on the number of students enrollees. Da...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predicting the enrollment has become a critical part of institutional planning processes in higher education. The forecast of enrolled number of students annually, represents a very important role, because the foundation of the budget and expenditures is based on the number of students enrollees. Data mining enables organizations to use their current reporting capabilities to uncover and understand hidden patterns in vast databases. These patterns are then built into data mining models and used to predict individual behavior with high accuracy. The University of Santo Tomas Entrance Test (USTET) and College Application Form is part of the admission requirements of the University. Every year, a new batch of examinees takes their chance in passing USTET. However, there are times that USTET passers do not enroll. The dataset used in this study was gathered from the UST Office for Admission and UST Office of the Secretary-General. In order to predict the enrollment at University of Santo Tomas, the Admission details of students in the past five academic years (A.Y.2015-2016 up to A.Y.2018-2019) were used. The attributes in the Application Form with no data mining value to classify the admitted applicant's enrollment behavior were ignored and the attributes with limited data gathered were not considered. The data set contains twenty-four (24) main attributes. Twenty-three (23) of those are the input variables, and one output variable for label, which has two categories: ENROLLED and NOT ENROLLED. Four data mining algorithms namely Artificial Neural Network, Bayesian Network, Decision Tree, and Logistic Regression were implemented and tested for efficiency using the software RapidMiner Studio Version 9.0.003. The performances of the algorithms were evaluated using the following parameters: accuracy, sensitivity, specificity, precision, negative predictive value. The most efficient model produced was determined by comparing the performance parameters. Thus, the results showed that Naïve Bayes have the highest accuracy in the validation or holdout dataset, which is 99.98%. |
---|---|
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/5.0100174 |