Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction

With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms’ potential to predict pavement serviceability, however some challenges limit its use....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Transportation research record 2023-08, Vol.2677 (8), p.196-206
Hauptverfasser:	Aranha, Ana Luisa, Bernucci, Liedi Légi Bariani, Vasconcelos, Kamilla L.
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	206
container_issue	8
container_start_page	196
container_title	Transportation research record
container_volume	2677
creator	Aranha, Ana Luisa Bernucci, Liedi Légi Bariani Vasconcelos, Kamilla L.
description	With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms’ potential to predict pavement serviceability, however some challenges limit its use. Training data preprocessing has a great impact on the model’s predictive performance, is highly dependent on the modeler’s experience, and is not typically reported in engineering-related literature. The objective of this study was to assess the effects of data preprocessing, hyperparameter selection, and time series size on the model’s evaluation metrics. Therefore, this paper analyzes the performance of three ML algorithms on maximum deflection (D0) and international roughness index (IRI) prediction: support vector machine, random forest (RF), and artificial neural network (ANN). An R2 and mean square error (MSE) analysis was conducted on 12 training datasets, with two sizes of historical data and five stages of data preprocessing. The results indicated that ANN was the most accurate technique with an R2 of 0.99 and MSE of 20 ×10−3 mm on the D0 prediction and an R2 of 0.91 and MSE of 0.03 m/km on the IRI prediction. RF was also identified as an effective technique, generating similar results with less data preprocessing. The addition of structural and traffic categorical features to the training dataset resulted in the most significant improvement of the support vector regression and ANN performance metrics; the hyperparameter selection was effective only on IRI prediction, especially with the ANN algorithm.
doi_str_mv	10.1177/03611981231155902
format	Article
fullrecord	<record><control><sourceid>sage_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1177_03611981231155902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_03611981231155902</sage_id><sourcerecordid>10.1177_03611981231155902</sourcerecordid><originalsourceid>FETCH-LOGICAL-c284t-aa977f338571244af806652aeca4bd787d12ea46d386eafae575110dcafe29583</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEEmPwA7jlD3TE-WjSI9rGh7SJHca5MqkzOm0pSgoS_54WuCFxsq3Hj2W9jF2DmAFYeyNUCVA5kArAmErIEzaRUFaFFkaessnIi3HhnF3kvBdCKW3VhNEyBPJ95l3gi3boE8WebxO2sY07vsAeM4048jX61zYSXxGmb7juGjpkHrrEN_hBx9HcUBrmI0ZPfJOoaX3fdvGSnQU8ZLr6rVP2fLfczh-K1dP94_x2VXjpdF8gVtYGpZyxILXG4ERZGonkUb801tkGJKEuG-VKwoBkrAEQjcdAsjJOTRn83PWpyzlRqN9Se8T0WYOox5zqPzkNzuzHybijet-9pzi8-I_wBWXHaNk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction</title><source>SAGE Complete</source><creator>Aranha, Ana Luisa ; Bernucci, Liedi Légi Bariani ; Vasconcelos, Kamilla L.</creator><creatorcontrib>Aranha, Ana Luisa ; Bernucci, Liedi Légi Bariani ; Vasconcelos, Kamilla L.</creatorcontrib><description>With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms’ potential to predict pavement serviceability, however some challenges limit its use. Training data preprocessing has a great impact on the model’s predictive performance, is highly dependent on the modeler’s experience, and is not typically reported in engineering-related literature. The objective of this study was to assess the effects of data preprocessing, hyperparameter selection, and time series size on the model’s evaluation metrics. Therefore, this paper analyzes the performance of three ML algorithms on maximum deflection (D0) and international roughness index (IRI) prediction: support vector machine, random forest (RF), and artificial neural network (ANN). An R2 and mean square error (MSE) analysis was conducted on 12 training datasets, with two sizes of historical data and five stages of data preprocessing. The results indicated that ANN was the most accurate technique with an R2 of 0.99 and MSE of 20 ×10−3 mm on the D0 prediction and an R2 of 0.91 and MSE of 0.03 m/km on the IRI prediction. RF was also identified as an effective technique, generating similar results with less data preprocessing. The addition of structural and traffic categorical features to the training dataset resulted in the most significant improvement of the support vector regression and ANN performance metrics; the hyperparameter selection was effective only on IRI prediction, especially with the ANN algorithm.</description><identifier>ISSN: 0361-1981</identifier><identifier>EISSN: 2169-4052</identifier><identifier>DOI: 10.1177/03611981231155902</identifier><language>eng</language><publisher>Los Angeles, CA: SAGE Publications</publisher><ispartof>Transportation research record, 2023-08, Vol.2677 (8), p.196-206</ispartof><rights>National Academy of Sciences: Transportation Research Board 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c284t-aa977f338571244af806652aeca4bd787d12ea46d386eafae575110dcafe29583</citedby><cites>FETCH-LOGICAL-c284t-aa977f338571244af806652aeca4bd787d12ea46d386eafae575110dcafe29583</cites><orcidid>0000-0003-4305-4829 ; 0000-0002-4768-0993 ; 0000-0003-0084-1400</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/03611981231155902$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/03611981231155902$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,776,780,21798,27901,27902,43597,43598</link.rule.ids></links><search><creatorcontrib>Aranha, Ana Luisa</creatorcontrib><creatorcontrib>Bernucci, Liedi Légi Bariani</creatorcontrib><creatorcontrib>Vasconcelos, Kamilla L.</creatorcontrib><title>Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction</title><title>Transportation research record</title><description>With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms’ potential to predict pavement serviceability, however some challenges limit its use. Training data preprocessing has a great impact on the model’s predictive performance, is highly dependent on the modeler’s experience, and is not typically reported in engineering-related literature. The objective of this study was to assess the effects of data preprocessing, hyperparameter selection, and time series size on the model’s evaluation metrics. Therefore, this paper analyzes the performance of three ML algorithms on maximum deflection (D0) and international roughness index (IRI) prediction: support vector machine, random forest (RF), and artificial neural network (ANN). An R2 and mean square error (MSE) analysis was conducted on 12 training datasets, with two sizes of historical data and five stages of data preprocessing. The results indicated that ANN was the most accurate technique with an R2 of 0.99 and MSE of 20 ×10−3 mm on the D0 prediction and an R2 of 0.91 and MSE of 0.03 m/km on the IRI prediction. RF was also identified as an effective technique, generating similar results with less data preprocessing. The addition of structural and traffic categorical features to the training dataset resulted in the most significant improvement of the support vector regression and ANN performance metrics; the hyperparameter selection was effective only on IRI prediction, especially with the ANN algorithm.</description><issn>0361-1981</issn><issn>2169-4052</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1PwzAMhiMEEmPwA7jlD3TE-WjSI9rGh7SJHca5MqkzOm0pSgoS_54WuCFxsq3Hj2W9jF2DmAFYeyNUCVA5kArAmErIEzaRUFaFFkaessnIi3HhnF3kvBdCKW3VhNEyBPJ95l3gi3boE8WebxO2sY07vsAeM4048jX61zYSXxGmb7juGjpkHrrEN_hBx9HcUBrmI0ZPfJOoaX3fdvGSnQU8ZLr6rVP2fLfczh-K1dP94_x2VXjpdF8gVtYGpZyxILXG4ERZGonkUb801tkGJKEuG-VKwoBkrAEQjcdAsjJOTRn83PWpyzlRqN9Se8T0WYOox5zqPzkNzuzHybijet-9pzi8-I_wBWXHaNk</recordid><startdate>202308</startdate><enddate>202308</enddate><creator>Aranha, Ana Luisa</creator><creator>Bernucci, Liedi Légi Bariani</creator><creator>Vasconcelos, Kamilla L.</creator><general>SAGE Publications</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-4305-4829</orcidid><orcidid>https://orcid.org/0000-0002-4768-0993</orcidid><orcidid>https://orcid.org/0000-0003-0084-1400</orcidid></search><sort><creationdate>202308</creationdate><title>Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction</title><author>Aranha, Ana Luisa ; Bernucci, Liedi Légi Bariani ; Vasconcelos, Kamilla L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c284t-aa977f338571244af806652aeca4bd787d12ea46d386eafae575110dcafe29583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aranha, Ana Luisa</creatorcontrib><creatorcontrib>Bernucci, Liedi Légi Bariani</creatorcontrib><creatorcontrib>Vasconcelos, Kamilla L.</creatorcontrib><collection>CrossRef</collection><jtitle>Transportation research record</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aranha, Ana Luisa</au><au>Bernucci, Liedi Légi Bariani</au><au>Vasconcelos, Kamilla L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction</atitle><jtitle>Transportation research record</jtitle><date>2023-08</date><risdate>2023</risdate><volume>2677</volume><issue>8</issue><spage>196</spage><epage>206</epage><pages>196-206</pages><issn>0361-1981</issn><eissn>2169-4052</eissn><abstract>With improvements in data collection, storage, and processing, machine learning (ML) is gaining momentum as a behavior prediction method in the field of engineering. Several studies have evaluated these algorithms’ potential to predict pavement serviceability, however some challenges limit its use. Training data preprocessing has a great impact on the model’s predictive performance, is highly dependent on the modeler’s experience, and is not typically reported in engineering-related literature. The objective of this study was to assess the effects of data preprocessing, hyperparameter selection, and time series size on the model’s evaluation metrics. Therefore, this paper analyzes the performance of three ML algorithms on maximum deflection (D0) and international roughness index (IRI) prediction: support vector machine, random forest (RF), and artificial neural network (ANN). An R2 and mean square error (MSE) analysis was conducted on 12 training datasets, with two sizes of historical data and five stages of data preprocessing. The results indicated that ANN was the most accurate technique with an R2 of 0.99 and MSE of 20 ×10−3 mm on the D0 prediction and an R2 of 0.91 and MSE of 0.03 m/km on the IRI prediction. RF was also identified as an effective technique, generating similar results with less data preprocessing. The addition of structural and traffic categorical features to the training dataset resulted in the most significant improvement of the support vector regression and ANN performance metrics; the hyperparameter selection was effective only on IRI prediction, especially with the ANN algorithm.</abstract><cop>Los Angeles, CA</cop><pub>SAGE Publications</pub><doi>10.1177/03611981231155902</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-4305-4829</orcidid><orcidid>https://orcid.org/0000-0002-4768-0993</orcidid><orcidid>https://orcid.org/0000-0003-0084-1400</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0361-1981
ispartof	Transportation research record, 2023-08, Vol.2677 (8), p.196-206
issn	0361-1981 2169-4052
language	eng
recordid	cdi_crossref_primary_10_1177_03611981231155902
source	SAGE Complete
title	Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T11%3A59%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-sage_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20of%20Different%20Training%20Datasets%20on%20Machine%20Learning%20Models%20for%20Pavement%20Performance%20Prediction&rft.jtitle=Transportation%20research%20record&rft.au=Aranha,%20Ana%20Luisa&rft.date=2023-08&rft.volume=2677&rft.issue=8&rft.spage=196&rft.epage=206&rft.pages=196-206&rft.issn=0361-1981&rft.eissn=2169-4052&rft_id=info:doi/10.1177/03611981231155902&rft_dat=%3Csage_cross%3E10.1177_03611981231155902%3C/sage_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_sage_id=10.1177_03611981231155902&rfr_iscdi=true