Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data

Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2013-04, Vol.38 (3), p.315-330
Hauptverfasser: Márquez-Vera, Carlos, Cano, Alberto, Romero, Cristóbal, Ventura, Sebastián
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 330
container_issue 3
container_start_page 315
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 38
creator Márquez-Vera, Carlos
Cano, Alberto
Romero, Cristóbal
Ventura, Sebastián
description Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.
doi_str_mv 10.1007/s10489-012-0374-8
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1315217540</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2911168591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c346t-f5166a9253fc1412bcd59745c88bb23813556ecaf57e56012581b322e529352c3</originalsourceid><addsrcrecordid>eNp1kEtLxDAUhYMoOI7-AHcBcVnNo0napQy-QNCFgruQpkmboY8xSRF_hP_ZdGYQN64C937n5NwDwDlGVxghcR0wyosyQ5hkiIo8Kw7AAjNBM5GX4hAsUEnyjPPy_RichLBGCFGK8AJ8v3hTOx3d0MAQp9oMEVrluskbqCIMuh3HDk5h3jdmMNFpuPFj41XfzzM11LB21ho_K2sVFUzz7WaTOKVbE-Cniy1sXdMmtDdDcOOguq3U9ZXq1KBNvdWegiOrumDO9u8SvN3dvq4esqfn-8fVzVOmac5jZhnmXJWEUatxjkmla1aKnOmiqCpCC0wZ40Yry4RhPHXCClxRQgwjJWVE0yW42PmmiB-TCVGux8mnUEFiihnBguUoUXhHaT-G4I2VG-965b8kRnJuXe5al-kHObcui6S53DuroFVnfTrOhV8hEZwKRnjiyI4LaTU0xv9J8K_5D1k8k30</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1315217540</pqid></control><display><type>article</type><title>Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data</title><source>SpringerNature Journals</source><creator>Márquez-Vera, Carlos ; Cano, Alberto ; Romero, Cristóbal ; Ventura, Sebastián</creator><creatorcontrib>Márquez-Vera, Carlos ; Cano, Alberto ; Romero, Cristóbal ; Ventura, Sebastián</creatorcontrib><description>Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-012-0374-8</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Academic achievement ; Academic failure ; Accuracy ; Algorithms ; Applied sciences ; Artificial Intelligence ; Biological and medical sciences ; Computer Science ; Computer science; control theory; systems ; Data mining ; Data processing. List processing. Character string processing ; Decision making ; Decision trees ; Distance learning ; Education ; Exact sciences and technology ; Fundamental and applied biological sciences. Psychology ; General aspects ; Machines ; Manufacturing ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Mechanical Engineering ; Memory organisation. Data processing ; Middle schools ; Neural networks ; Occupational training. Personnel. Work management ; Processes ; Secondary education ; Software ; Statistical methods ; Students ; Success</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2013-04, Vol.38 (3), p.315-330</ispartof><rights>Springer Science+Business Media, LLC 2012</rights><rights>2015 INIST-CNRS</rights><rights>Springer Science+Business Media New York 2013</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c346t-f5166a9253fc1412bcd59745c88bb23813556ecaf57e56012581b322e529352c3</citedby><cites>FETCH-LOGICAL-c346t-f5166a9253fc1412bcd59745c88bb23813556ecaf57e56012581b322e529352c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-012-0374-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-012-0374-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27929,27930,41493,42562,51324</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=27637526$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Márquez-Vera, Carlos</creatorcontrib><creatorcontrib>Cano, Alberto</creatorcontrib><creatorcontrib>Romero, Cristóbal</creatorcontrib><creatorcontrib>Ventura, Sebastián</creatorcontrib><title>Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.</description><subject>Academic achievement</subject><subject>Academic failure</subject><subject>Accuracy</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial Intelligence</subject><subject>Biological and medical sciences</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Data mining</subject><subject>Data processing. List processing. Character string processing</subject><subject>Decision making</subject><subject>Decision trees</subject><subject>Distance learning</subject><subject>Education</subject><subject>Exact sciences and technology</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Mechanical Engineering</subject><subject>Memory organisation. Data processing</subject><subject>Middle schools</subject><subject>Neural networks</subject><subject>Occupational training. Personnel. Work management</subject><subject>Processes</subject><subject>Secondary education</subject><subject>Software</subject><subject>Statistical methods</subject><subject>Students</subject><subject>Success</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kEtLxDAUhYMoOI7-AHcBcVnNo0napQy-QNCFgruQpkmboY8xSRF_hP_ZdGYQN64C937n5NwDwDlGVxghcR0wyosyQ5hkiIo8Kw7AAjNBM5GX4hAsUEnyjPPy_RichLBGCFGK8AJ8v3hTOx3d0MAQp9oMEVrluskbqCIMuh3HDk5h3jdmMNFpuPFj41XfzzM11LB21ho_K2sVFUzz7WaTOKVbE-Cniy1sXdMmtDdDcOOguq3U9ZXq1KBNvdWegiOrumDO9u8SvN3dvq4esqfn-8fVzVOmac5jZhnmXJWEUatxjkmla1aKnOmiqCpCC0wZ40Yry4RhPHXCClxRQgwjJWVE0yW42PmmiB-TCVGux8mnUEFiihnBguUoUXhHaT-G4I2VG-965b8kRnJuXe5al-kHObcui6S53DuroFVnfTrOhV8hEZwKRnjiyI4LaTU0xv9J8K_5D1k8k30</recordid><startdate>20130401</startdate><enddate>20130401</enddate><creator>Márquez-Vera, Carlos</creator><creator>Cano, Alberto</creator><creator>Romero, Cristóbal</creator><creator>Ventura, Sebastián</creator><general>Springer US</general><general>Kluwer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope></search><sort><creationdate>20130401</creationdate><title>Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data</title><author>Márquez-Vera, Carlos ; Cano, Alberto ; Romero, Cristóbal ; Ventura, Sebastián</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c346t-f5166a9253fc1412bcd59745c88bb23813556ecaf57e56012581b322e529352c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Academic achievement</topic><topic>Academic failure</topic><topic>Accuracy</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial Intelligence</topic><topic>Biological and medical sciences</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Data mining</topic><topic>Data processing. List processing. Character string processing</topic><topic>Decision making</topic><topic>Decision trees</topic><topic>Distance learning</topic><topic>Education</topic><topic>Exact sciences and technology</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Mechanical Engineering</topic><topic>Memory organisation. Data processing</topic><topic>Middle schools</topic><topic>Neural networks</topic><topic>Occupational training. Personnel. Work management</topic><topic>Processes</topic><topic>Secondary education</topic><topic>Software</topic><topic>Statistical methods</topic><topic>Students</topic><topic>Success</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Márquez-Vera, Carlos</creatorcontrib><creatorcontrib>Cano, Alberto</creatorcontrib><creatorcontrib>Romero, Cristóbal</creatorcontrib><creatorcontrib>Ventura, Sebastián</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Márquez-Vera, Carlos</au><au>Cano, Alberto</au><au>Romero, Cristóbal</au><au>Ventura, Sebastián</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2013-04-01</date><risdate>2013</risdate><volume>38</volume><issue>3</issue><spage>315</spage><epage>330</epage><pages>315-330</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10489-012-0374-8</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2013-04, Vol.38 (3), p.315-330
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_1315217540
source SpringerNature Journals
subjects Academic achievement
Academic failure
Accuracy
Algorithms
Applied sciences
Artificial Intelligence
Biological and medical sciences
Computer Science
Computer science
control theory
systems
Data mining
Data processing. List processing. Character string processing
Decision making
Decision trees
Distance learning
Education
Exact sciences and technology
Fundamental and applied biological sciences. Psychology
General aspects
Machines
Manufacturing
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Mechanical Engineering
Memory organisation. Data processing
Middle schools
Neural networks
Occupational training. Personnel. Work management
Processes
Secondary education
Software
Statistical methods
Students
Success
title Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T06%3A10%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20student%20failure%20at%20school%20using%20genetic%20programming%20and%20different%20data%20mining%20approaches%20with%20high%20dimensional%20and%20imbalanced%20data&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=M%C3%A1rquez-Vera,%20Carlos&rft.date=2013-04-01&rft.volume=38&rft.issue=3&rft.spage=315&rft.epage=330&rft.pages=315-330&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-012-0374-8&rft_dat=%3Cproquest_cross%3E2911168591%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1315217540&rft_id=info:pmid/&rfr_iscdi=true