Exploring effective ways to increase reliable positive samples for machine learning-based urban waterlogging susceptibility assessments
Machine learning (ML)-based urban waterlogging susceptibility studies suffer from class imbalance, as fewer positive samples are generally available than potential negative samples. Few studies have considered optimizing the results by improving the quality of training samples. To address this issue...
Gespeichert in:
Veröffentlicht in: | Journal of environmental management 2023-10, Vol.344, p.118682-118682, Article 118682 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine learning (ML)-based urban waterlogging susceptibility studies suffer from class imbalance, as fewer positive samples are generally available than potential negative samples. Few studies have considered optimizing the results by improving the quality of training samples. To address this issue, we explored effective approaches to reliably increase the numbers of positive samples for such studies. The Synthetic Minority Over-Sampling Technique (SMOTE) and Optimized Seed Spread Algorithm (OSSA), representative of oversampling (synthesizing new samples based on the feature space) and physical (simulating potential inundated area based on the mechanisms of water flow) approaches, respectively, were employed to increase the number of positive samples. Waterlogging in Shenzhen was selected as a case study using eight selected spatial variables. An elaborate experiment was conducted to compare the quality of added samples based on the classifiers’ performance and accuracy of waterlogging susceptibility maps (WSMs). The results indicated that (1) the performance of classifiers generated with SMOTE was worse than the original samples, while the use of OSSA improved the trained classifiers, and (2) the accuracy of WSMs was not improved with SMOTE but increased markedly with OSSA. These results may be driven by the diversity of information and features of the added samples. This study indicates the use of SMOTE fails to synthesize reliable samples when applied to waterlogging analysis in Shenzhen, whereas an effective solution for generating reliable positive samples is to use OSSA that simulates the potential submerged regions based on the mechanisms of disaster occurrence and spread.
•Machine learning was used to evaluate waterlogging susceptibility in Shenzhen.•SMOTE and OSSA were used to increase positive samples for machine learning.•SMOTE fails to increase positive samples with high quality.•OSSA can generate reliable positive samples, improving the assessment results. |
---|---|
ISSN: | 0301-4797 1095-8630 |
DOI: | 10.1016/j.jenvman.2023.118682 |