Addressing class imbalance in avalanche forecasting

Natural disasters like avalanches and earthquakes are examples of rare events. Predicting such events using supervised classification machine learning models suffers from the class imbalance problem. The number of non-avalanche days exceed the number of avalanche days, and such data distribution ske...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cold regions science and technology 2025-03, Vol.231, p.104411, Article 104411
Hauptverfasser: Kala, Manish, Jain, Shweta, Singh, Amreek, Krishnan, Narayanan Chatapuram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Natural disasters like avalanches and earthquakes are examples of rare events. Predicting such events using supervised classification machine learning models suffers from the class imbalance problem. The number of non-avalanche days exceed the number of avalanche days, and such data distribution skewness interferes with the construction of decision boundaries to support the decision-making procedure. This paper analyses class imbalance from the perspective of avalanche prediction by involving multiple classification approaches, three oversampling and two undersampling techniques, and cost-sensitive approaches. The supervised approaches aimed to predict days with and without avalanches as binary classification. The study was conducted using past 25 seasons of snow and meteorological parameters recorded for two climatologically diverse avalanche prone regions of Indian Himalayas with different levels of class imbalance. The paper also proposes more extensive use of evaluation metrics like balanced accuracy, geometric mean, Probability of Detection (POD) and Peirce Skill Score (PSS) that are pertinent to imbalanced class domains like avalanche forecasting. Extensive empirical experiments and evaluations amply demonstrate that these class balancing techniques lead to significant improvements in the performance of avalanche forecasting models for both regions, albeit with some variations. The POD values improved to 0.83 for Random Forest classifier, 0.65 for Support Vector Machine classifier and 0.75 for Logistic Regression classifier; PSS values also improved to 0.53, 0.47 and 0.5 for Random Forest, Support Vector Machine, and Logistic Regression classifiers, respectively. These findings are complemented by theoretical insights on the proposed solutions to the class imbalance. Our results suggest that the classification based avalanche forecasting models trained using proposed approaches can serve as valuable supplementary decision support tool for avalanche forecasters. •Avalanche forecasting suffers from class imbalance.•Traditional classification approaches perform poorly.•Oversampling rare avalanche records, undersampling majority class.•Cost-sensitive classifiers with different cost ratios.•Evaluation metrics balanced accuracy, geometric mean.
ISSN:0165-232X
DOI:10.1016/j.coldregions.2024.104411