Association rule mining with a special rule coding and dynamic genetic algorithm for air quality impact factors in Beijing, China

Understanding air quality requires a comprehensive understanding of its various factors. Most of the association rule techniques focuses on high frequency terms, ignoring the potential importance of low- frequency terms and causing unnecessary storage space waste. Therefore, a dynamic genetic associ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2024-03, Vol.19 (3), p.e0299865-e0299865
Hauptverfasser: Wu, Xiaoxuan, Wen, Qiang, Zhu, Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Understanding air quality requires a comprehensive understanding of its various factors. Most of the association rule techniques focuses on high frequency terms, ignoring the potential importance of low- frequency terms and causing unnecessary storage space waste. Therefore, a dynamic genetic association rule mining algorithm is proposed in this paper, which combines the improved dynamic genetic algorithm with the association rule mining algorithm to realize the importance mining of low- frequency terms. Firstly, in the chromosome coding phase of genetic algorithm, an innovative multi-information coding strategy is proposed, which selectively stores similar values of different levels in one storage unit. It avoids storing all the values at once and facilitates efficient mining of valid rules later. Secondly, by weighting the evaluation indicators such as support, confidence and promotion in association rule mining, a new evaluation index is formed, avoiding the need to set a minimum threshold for high-interest rules. Finally, in order to improve the mining performance of the rules, the dynamic crossover rate and mutation rate are set to improve the search efficiency of the algorithm. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the unit point multi-information coding strategy in reducing the rule storage air, the effectiveness of mining the rules formed by the low frequency item set, and the effectiveness of combining the rule mining algorithm with the swarm intelligence optimization algorithm in terms of search time and convergence. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the above three aspects. The unit point multi-information coding strategy reduced the rule space storage consumption by 50%, the new evaluation index can mine more interesting rules whose interest level can be up to 90%, while mining the rules formed by the lower frequency terms, and in terms of search time, we reduced it about 20% compared with some meta-heuristic algorithms, while improving convergence.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0299865