A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach

This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Education and information technologies 2021-03, Vol.26 (2), p.1527-1547
Hauptverfasser: Costa-Mendes, Ricardo, Oliveira, Tiago, Castelli, Mauro, Cruz-Jesus, Frederico
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1547
container_issue 2
container_start_page 1527
container_title Education and information technologies
container_volume 26
creator Costa-Mendes, Ricardo
Oliveira, Tiago
Castelli, Mauro
Cruz-Jesus, Frederico
description This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p -values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.
doi_str_mv 10.1007/s10639-020-10316-y
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2503196935</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A713712797</galeid><ericid>EJ1292193</ericid><sourcerecordid>A713712797</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-2420fbd36736310f4f04d9132b1816bde59fa004718ee28e78a069c6d446f6e53</originalsourceid><addsrcrecordid>eNp9UUtr3DAQNqWBpkn_QKEg6NnpjGRLVm9LSPogkBzas9DKI1thV9pKXuj--6p16QNK0WGEvsdo5mualwhXCKDeFAQpdAscWgSBsj09ac6xV6JVEoan9S4ktFz06lnzvJRHANCq4-cNbdjeujlEYjuyOYY4MXs45PQ17O0SUmTJs2UmxgF79pDycpyOVIjNYZpZcXNKO1aW40hxYVO2I5W3bMPm0zaHcXWq9pfNmbe7Qi9-1ovm8-3Np-v37d39uw_Xm7vWdT1fWt5x8NtRSCWkQPCdh27UKPgWB5TbkXrtLUCncCDiA6nBgtROjl0nvaReXDSvV9_a9kv95mIe0zHH2tLwvu5FSy3-YE12RyZEn5Zs3T4UZzYKhUKutKqsq3-w6hlpH1yK5EN9_0vAV4HLqZRM3hxyXWI-GQTzPSWzpmRqSuZHSuZURa9WEeXgfgluPiLXHLWouFjxUrE4Uf490X9cvwHq3pxE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2503196935</pqid></control><display><type>article</type><title>A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</title><source>SpringerLink Journals - AutoHoldings</source><creator>Costa-Mendes, Ricardo ; Oliveira, Tiago ; Castelli, Mauro ; Cruz-Jesus, Frederico</creator><creatorcontrib>Costa-Mendes, Ricardo ; Oliveira, Tiago ; Castelli, Mauro ; Cruz-Jesus, Frederico</creatorcontrib><description>This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p -values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.</description><identifier>ISSN: 1360-2357</identifier><identifier>EISSN: 1573-7608</identifier><identifier>DOI: 10.1007/s10639-020-10316-y</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Academic Achievement ; Academic grading ; Algorithms ; Analysis ; Artificial Intelligence ; Class Size ; Computation ; Computer Appl. in Social and Behavioral Sciences ; Computer Science ; Computers and Education ; Correlation ; Data Collection ; Data mining ; Education ; Educational Technology ; Electronic Learning ; Foreign Countries ; Grades (Scholastic) ; High School Students ; Information Systems ; Information Systems Applications (incl.Internet) ; Machine learning ; Mathematics ; Neural networks ; Predictive Measurement ; Regression (Statistics) ; Secondary education ; Statistical Analysis ; Statistical Significance ; User Interfaces and Human Computer Interaction</subject><ispartof>Education and information technologies, 2021-03, Vol.26 (2), p.1527-1547</ispartof><rights>The Author(s) 2020</rights><rights>COPYRIGHT 2021 Springer</rights><rights>The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-2420fbd36736310f4f04d9132b1816bde59fa004718ee28e78a069c6d446f6e53</citedby><cites>FETCH-LOGICAL-c452t-2420fbd36736310f4f04d9132b1816bde59fa004718ee28e78a069c6d446f6e53</cites><orcidid>0000-0001-6523-0809 ; 0000-0002-8793-1451 ; 0000-0002-4446-5980 ; 0000-0002-9259-4576</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10639-020-10316-y$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10639-020-10316-y$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ1292193$$DView record in ERIC$$Hfree_for_read</backlink></links><search><creatorcontrib>Costa-Mendes, Ricardo</creatorcontrib><creatorcontrib>Oliveira, Tiago</creatorcontrib><creatorcontrib>Castelli, Mauro</creatorcontrib><creatorcontrib>Cruz-Jesus, Frederico</creatorcontrib><title>A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</title><title>Education and information technologies</title><addtitle>Educ Inf Technol</addtitle><description>This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p -values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.</description><subject>Academic Achievement</subject><subject>Academic grading</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Class Size</subject><subject>Computation</subject><subject>Computer Appl. in Social and Behavioral Sciences</subject><subject>Computer Science</subject><subject>Computers and Education</subject><subject>Correlation</subject><subject>Data Collection</subject><subject>Data mining</subject><subject>Education</subject><subject>Educational Technology</subject><subject>Electronic Learning</subject><subject>Foreign Countries</subject><subject>Grades (Scholastic)</subject><subject>High School Students</subject><subject>Information Systems</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Machine learning</subject><subject>Mathematics</subject><subject>Neural networks</subject><subject>Predictive Measurement</subject><subject>Regression (Statistics)</subject><subject>Secondary education</subject><subject>Statistical Analysis</subject><subject>Statistical Significance</subject><subject>User Interfaces and Human Computer Interaction</subject><issn>1360-2357</issn><issn>1573-7608</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9UUtr3DAQNqWBpkn_QKEg6NnpjGRLVm9LSPogkBzas9DKI1thV9pKXuj--6p16QNK0WGEvsdo5mualwhXCKDeFAQpdAscWgSBsj09ac6xV6JVEoan9S4ktFz06lnzvJRHANCq4-cNbdjeujlEYjuyOYY4MXs45PQ17O0SUmTJs2UmxgF79pDycpyOVIjNYZpZcXNKO1aW40hxYVO2I5W3bMPm0zaHcXWq9pfNmbe7Qi9-1ovm8-3Np-v37d39uw_Xm7vWdT1fWt5x8NtRSCWkQPCdh27UKPgWB5TbkXrtLUCncCDiA6nBgtROjl0nvaReXDSvV9_a9kv95mIe0zHH2tLwvu5FSy3-YE12RyZEn5Zs3T4UZzYKhUKutKqsq3-w6hlpH1yK5EN9_0vAV4HLqZRM3hxyXWI-GQTzPSWzpmRqSuZHSuZURa9WEeXgfgluPiLXHLWouFjxUrE4Uf490X9cvwHq3pxE</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Costa-Mendes, Ricardo</creator><creator>Oliveira, Tiago</creator><creator>Castelli, Mauro</creator><creator>Cruz-Jesus, Frederico</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88B</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>CJNVE</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>M0P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>PQEDU</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-6523-0809</orcidid><orcidid>https://orcid.org/0000-0002-8793-1451</orcidid><orcidid>https://orcid.org/0000-0002-4446-5980</orcidid><orcidid>https://orcid.org/0000-0002-9259-4576</orcidid></search><sort><creationdate>20210301</creationdate><title>A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</title><author>Costa-Mendes, Ricardo ; Oliveira, Tiago ; Castelli, Mauro ; Cruz-Jesus, Frederico</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-2420fbd36736310f4f04d9132b1816bde59fa004718ee28e78a069c6d446f6e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Academic Achievement</topic><topic>Academic grading</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Class Size</topic><topic>Computation</topic><topic>Computer Appl. in Social and Behavioral Sciences</topic><topic>Computer Science</topic><topic>Computers and Education</topic><topic>Correlation</topic><topic>Data Collection</topic><topic>Data mining</topic><topic>Education</topic><topic>Educational Technology</topic><topic>Electronic Learning</topic><topic>Foreign Countries</topic><topic>Grades (Scholastic)</topic><topic>High School Students</topic><topic>Information Systems</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Machine learning</topic><topic>Mathematics</topic><topic>Neural networks</topic><topic>Predictive Measurement</topic><topic>Regression (Statistics)</topic><topic>Secondary education</topic><topic>Statistical Analysis</topic><topic>Statistical Significance</topic><topic>User Interfaces and Human Computer Interaction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Costa-Mendes, Ricardo</creatorcontrib><creatorcontrib>Oliveira, Tiago</creatorcontrib><creatorcontrib>Castelli, Mauro</creatorcontrib><creatorcontrib>Cruz-Jesus, Frederico</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Education Database (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Education Collection</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>Education Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest One Education</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Education and information technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Costa-Mendes, Ricardo</au><au>Oliveira, Tiago</au><au>Castelli, Mauro</au><au>Cruz-Jesus, Frederico</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ1292193</ericid><atitle>A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach</atitle><jtitle>Education and information technologies</jtitle><stitle>Educ Inf Technol</stitle><date>2021-03-01</date><risdate>2021</risdate><volume>26</volume><issue>2</issue><spage>1527</spage><epage>1547</epage><pages>1527-1547</pages><issn>1360-2357</issn><eissn>1573-7608</eissn><abstract>This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p -values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10639-020-10316-y</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0001-6523-0809</orcidid><orcidid>https://orcid.org/0000-0002-8793-1451</orcidid><orcidid>https://orcid.org/0000-0002-4446-5980</orcidid><orcidid>https://orcid.org/0000-0002-9259-4576</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1360-2357
ispartof Education and information technologies, 2021-03, Vol.26 (2), p.1527-1547
issn 1360-2357
1573-7608
language eng
recordid cdi_proquest_journals_2503196935
source SpringerLink Journals - AutoHoldings
subjects Academic Achievement
Academic grading
Algorithms
Analysis
Artificial Intelligence
Class Size
Computation
Computer Appl. in Social and Behavioral Sciences
Computer Science
Computers and Education
Correlation
Data Collection
Data mining
Education
Educational Technology
Electronic Learning
Foreign Countries
Grades (Scholastic)
High School Students
Information Systems
Information Systems Applications (incl.Internet)
Machine learning
Mathematics
Neural networks
Predictive Measurement
Regression (Statistics)
Secondary education
Statistical Analysis
Statistical Significance
User Interfaces and Human Computer Interaction
title A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T19%3A01%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20machine%20learning%20approximation%20of%20the%202015%20Portuguese%20high%20school%20student%20grades:%20A%20hybrid%20approach&rft.jtitle=Education%20and%20information%20technologies&rft.au=Costa-Mendes,%20Ricardo&rft.date=2021-03-01&rft.volume=26&rft.issue=2&rft.spage=1527&rft.epage=1547&rft.pages=1527-1547&rft.issn=1360-2357&rft.eissn=1573-7608&rft_id=info:doi/10.1007/s10639-020-10316-y&rft_dat=%3Cgale_proqu%3EA713712797%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2503196935&rft_id=info:pmid/&rft_galeid=A713712797&rft_ericid=EJ1292193&rfr_iscdi=true