A two-stage clustering ensemble algorithm applicable to risk assessment of railway signaling faults

Knowledge graph (KG) modeling constructs a connected network of hazard/fault events by drawing on historical reports, helping extract risk factors and propagation features in railway signaling safety. For the huge graph structure of the knowledge system, efficiently and accurately transforming the e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2024-09, Vol.249, p.123500, Article 123500
Hauptverfasser: Liu, Chang, Yang, Shiwu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Knowledge graph (KG) modeling constructs a connected network of hazard/fault events by drawing on historical reports, helping extract risk factors and propagation features in railway signaling safety. For the huge graph structure of the knowledge system, efficiently and accurately transforming the event information with text for risk level prediction is the crucial topic of this paper. To establish a quantitative risk assessment method based on KG modeling, we innovatively introduce text clustering technology to intelligently divide the entity short text data set in the KG, assign standardized entity names to the cluster partitions, and then complete the calculation and analysis according to the characteristic parameter formula, greatly reducing the labor and time consumption of data annotation and approximate text repetitive processing. Significantly, a two-stage clustering ensemble algorithm (TSCEA) integrating base clusterer optimization and hierarchical clustering prediction is proposed, whose innovative approach lies in describing sample similarity by the co-association matrix, while considering the contribution discrepancy of samples for decision-making. Specifically, Stage One selects and optimizes various base clusterers with different method kernels, while Stage Two emphasizes the contribution discrepancy of the cluster core and the cluster halo to make the final clustering decision, with the ensemble strategy serving as the bridge between these two. The KG established on field investigation is used to complete the experiment and analysis of the novel algorithm. Another clustering ensemble algorithm (based on a direct weighting strategy) and each base clusterer are applied to verify clustering performance, while the evaluation depends on external indicators. The results indicate that the proposed clustering ensemble algorithm outperforms others in various indicators, significantly improving clustering accuracy and robustness.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2024.123500