Predicting Student Performance from Online Engagement Activities Using Novel Statistical Features

Predicting students’ performance during their years of academic study has been investigated tremendously. It offers important insights that can help and guide institutions to make timely decisions and changes leading to better student outcome achievements. In the post-COVID-19 pandemic era, the adop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Arabian journal for science and engineering 2022, Vol.47 (8), p.10225-10243
1. Verfasser: Brahim, Ghassen Ben
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Predicting students’ performance during their years of academic study has been investigated tremendously. It offers important insights that can help and guide institutions to make timely decisions and changes leading to better student outcome achievements. In the post-COVID-19 pandemic era, the adoption of e-learning has gained momentum and has increased the availability of online related learning data. This has encouraged researchers to develop machine learning (ML)-based models to predict students’ performance during online classes. The study presented in this paper, focuses on predicting student performance during a series of online interactive sessions by considering a dataset collected using digital electronics education and design suite. The dataset tracks the interaction of students during online lab work in terms of text editing, a number of keystrokes, time spent in each activity, etc., along with the exam score achieved per session. Our proposed prediction model consists of extracting a total of 86 novel statistical features, which were semantically categorized in three broad categories based on different criteria: (1) activity type, (2) timing statistics, and (3) peripheral activity count. This set of features were further reduced during the feature selection phase and only influential features were retained for training purposes. Our proposed ML model aims to predict whether a student’s performance will be low or high. Five popular classifiers were used in our study, namely: random forest (RF), support vector machine, Naïve Bayes, logistic regression, and multilayer perceptron. We evaluated our model under three different scenarios: (1) 80:20 random data split for training and testing, (2) fivefold cross-validation, and (3) train the model on all sessions but one which will be used for testing. Results showed that our model achieved the best classification accuracy performance of 97.4% with the RF classifier. We demonstrated that, under similar experimental setup, our model outperformed other existing studies.
ISSN:2193-567X
1319-8025
2191-4281
DOI:10.1007/s13369-021-06548-w