RSFD: A rough set-based feature discretization method for meteorological data

Meteorological data mining aims to discover hidden patterns in a large number of available meteorological data. As one of the most relevant big data preprocessing technologies, feature discretization can transform continuous features into discrete ones to improve the efficiency of meteorological dat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers in environmental science 2022-09, Vol.10
Hauptverfasser: Zeng, Lirong, Chen, Qiong, Huang, Mengxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Meteorological data mining aims to discover hidden patterns in a large number of available meteorological data. As one of the most relevant big data preprocessing technologies, feature discretization can transform continuous features into discrete ones to improve the efficiency of meteorological data mining algorithms. Aiming at the problems of high interaction of multiple attributes, noise interference, and difficulty in obtaining prior knowledge in meteorological data, we propose a rough set-based feature discretization method for meteorological data (RSFD). First, we calculate the information gain of each candidate breakpoint in the meteorological attribute to split the intervals. Then, we use chi-square test to merge these discrete intervals. Finally, we take the variation of indiscernibility relation in rough set as the evaluation criterion for the discretization scheme. We scan each attribute in turn by using the strategy of splitting first and then merging, thus obtaining the optimal discrete feature set. We compare RSFD with the state-of-the-art discretization methods on meteorological data. Experiments show that our method achieves better results in the classification accuracy of meteorological data, and obtains a smaller number of discrete intervals while ensuring data consistency.
ISSN:2296-665X
2296-665X
DOI:10.3389/fenvs.2022.1013811