A high-quality gap-filled daily ETo dataset for China during 1951-2021 from synoptic stations

The reference evapotranspiration (ETo) is essential for water-consuming in agriculture and land-water cycle research. The synoptic data from meteorological stations can provide reliable ground data for ETo estimation with the FAO-56 Penman-Monteith equation. However, the five primary variables this...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhou, Ningshan, Wu, Lifeng, Yang, Qiliang, Yang, ling, Dong, Jianhua, Li, Yue
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The reference evapotranspiration (ETo) is essential for water-consuming in agriculture and land-water cycle research. The synoptic data from meteorological stations can provide reliable ground data for ETo estimation with the FAO-56 Penman-Monteith equation. However, the five primary variables this equation needs, including maximum temperature (Tmax), minimum temperature (Tmin), sunshine duration (SSD), wind speed (Wind), and relative humidity (RH), often experience severe data loss due to force majeure events in synoptic data. The data loss would directly introduce severe data gaps to the complex records for ETo. Machine learning algorithms can fill various data gaps with low error rates, however, to achieve high data quality, the algorithms must be selected properly to deal with the distinct types of data loss and train independently. Here, based on the data characters, we investigated and classified data gaps from the synoptic dataset into 2 major types: the common, minor data loss gaps including Tmax loss/Tmin loss/SSD loss/Wind loss/RH loss/Wind and SSD loss/Wind and RH loss, and the other 19 types of data loss which is more severe in information loss but barely occurred. Our results show that the XGBoost model achieved the best accuracy in all 3 machine learning models with high statistic levels. For the other 19 types of data gaps, the LSTM models were trained separately for each site and achieved average R², RMSE, and nRMSE at 0.9, 0.5 mm d-1, and 38% for the total 2419 stations. Thus, we propose a high-quality, gap-filled daily ETo dataset during 1951-2021 for China with the proportion of large errors (the data with daily ETo errors more than 1.5 mm d-1) below 0.2%. Our results also reveal that the entanglement degree between synoptic variables varies a lot from region to region in China.
DOI:10.5281/zenodo.11496931