Investigation and application of data balancing and combined discriminant model in rock burst severity prediction

In the development of intelligent rock burst prediction models, issues such as incomplete data coverage and data imbalance are frequently encountered. These issues may lead to risks of overfitting in predictive models, poor generalization capabilities, and increased bias, which in turn may result in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2024-11, Vol.14 (1), p.29657-24
Hauptverfasser: Yan, Shaohong, Liu, Runze, Zhang, Yanbo, Yao, Xulong, Yang, Yueqi, Wang, Qi, Guo, Bin, Wang, Shuai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the development of intelligent rock burst prediction models, issues such as incomplete data coverage and data imbalance are frequently encountered. These issues may lead to risks of overfitting in predictive models, poor generalization capabilities, and increased bias, which in turn may result in misjudgments and unpredictable losses. To accurately predict rock burst disasters and mitigate or eliminate related threats, this paper proposes a composite prediction model that integrates Density-Based Nonlinear Resampling (DBNR)-Tomek Link data balancing algorithms with Bayesian Optimization (BO)-Multilayer Perceptron (MLP)-Random Forest (RF). Initially, this study collected and organized a total of 301 recorded rock burst disaster field observation data, covering various tectonic plates, engineering types, rock origins, rock textures, and rock burst types. Subsequently, from a data analysis perspective, we employed the PCA-SSA-K-means unsupervised clustering algorithm to delve into the underlying information contained within the data, thereby validating the rationality of categorizing rock bursts into four grades. Then, using the L2 norm to optimize the dimensionality of the indicators and supplementing with indicator importance ranking and hypothesis testing, we selected the maximum tangential stress of the surrounding rock, the ratio of the maximum tangential stress of the surrounding rock to the uniaxial compressive strength of the rock (stress coefficient), and the elastic energy index as the criteria for rock burst intensity grading. Following that, the DBNR-Tomek Link sampling method was applied to balance the sample data, optimizing the data sample ratio and ultimately expanding the sample size to 396, improving the proportion of data samples from 2:3:4:1 to 1:1:2:1, thereby enhancing the model’s generalization performance. Ultimately, a BO-MLP-RF composite prediction model was constructed based on Bayesian Optimization (BO), Multilayer Perceptron (MLP), and Random Forest (RF) algorithms, with the Bayesian Optimization method ensuring that the model fits the training data well and generalizes to the test data. The results of tenfold cross-validation demonstrated that the model’s accuracy is consistently around 92.5%, combining the training results of rock burst models with imbalanced datasets, proving that the MLP model, adept at modeling nonlinear data, and the RF model, skilled in modeling large-scale data, serve as basic classifiers. This demonstr
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-024-81307-z