Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data

Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2013-04, Vol.38 (3), p.315-330
Hauptverfasser:	Márquez-Vera, Carlos, Cano, Alberto, Romero, Cristóbal, Ventura, Sebastián
Format:	Artikel
Sprache:	eng
Schlagworte:	Academic achievement Academic failure Accuracy Algorithms Applied sciences Artificial Intelligence Biological and medical sciences Computer Science Computer science control theory systems Data mining Data processing. List processing. Character string processing Decision making Decision trees Distance learning Education Exact sciences and technology Fundamental and applied biological sciences. Psychology General aspects Machines Manufacturing Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Mechanical Engineering Memory organisation. Data processing Middle schools Neural networks Occupational training. Personnel. Work management Processes Secondary education Software Statistical methods Students Success
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-012-0374-8