A Comparison of Multiple Imputation Methods for Recovering Missing Data in Hydrological Studies

Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variabilit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Civil Engineering Journal 2021-09, Vol.7 (9), p.1608-1619
Hauptverfasser: Hamzah, Fatimah Bibi, Mohd Hamzah, Firdaus, Mohd Razali, Siti Fatin, Samad, Hafiza
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets. Doi: 10.28991/cej-2021-03091747 Full Text: PDF
ISSN:2676-6957
2476-3055
DOI:10.28991/cej-2021-03091747