A Machine Learning Approach on Outlier Removal for Decision Tree Regression Method

Outliers can occur in application areas, adversely affecting the prediction method's performance. Outliers can be removed by using robust statistical algorithms. However, statistical methods have limitations in capturing the outlier for high-dimensional data. Approaches using Machine Learning (...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ingénierie des systèmes d'Information 2024-08, Vol.29 (4), p.1397-1403
Hauptverfasser:	Sihabuddin, Agus, Rokhman, Nur, Wahyudi, Erwin Eko
Format:	Artikel
Sprache:	eng ; fre
Schlagworte:	Accuracy Air quality Algorithms Data analysis Data points Datasets Decision trees Error analysis Machine learning Methods Missing data Outliers (statistics) Root-mean-square errors Statistical analysis Statistical methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Outliers can occur in application areas, adversely affecting the prediction method's performance. Outliers can be removed by using robust statistical algorithms. However, statistical methods have limitations in capturing the outlier for high-dimensional data. Approaches using Machine Learning (ML) are offered as they develop rapidly due to their excellent interpretability and strong generalization capabilities. So, ML is popular in detecting or eliminating outliers to increase the accuracy of forecasting methods, such as Isolation Forest (IF), an unsupervised outlier detection strategy using a collective approach to calculate the isolation score for every data point. This research objective is to improve the prediction accuracy of the Decision Tree Regression (DTR) method by proposing an IF as an ML-based outlier removal method. The proposed method was tested by two Air Quality Index (AQI) dataset that contained outliers with Mean Absolute Error (MAE), R-Square, and Root Mean Square Error (RMSE) as the accuracy measurements. The results showed that the proposed method outperforms previous studies.
ISSN:	1633-1311 2116-7125
DOI:	10.18280/isi.290414