Mining frequent generators and closures in data streams with FGC-Stream

Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge and information systems 2023-08, Vol.65 (8), p.3295-3335
Hauptverfasser: Martin, Tomas, Valtchev, Petko, Roux, Louis-Romain
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs) or generators (FGIs), instead of plain FIs, has been explored. So far the tasks of mining FGIs and FCIs have only been addressed separately over data streams. Yet, both itemset families combine in the solutions of a range of practical problems while they also underlie the definition of handy association rule bases. To date, the joint mining task can only be approached by a combining two dedicated miners. As a remedy, we propose a holistic approach rooted in the support set-based equivalence classes underlying a transaction dataset: the ensuing FGC - Stream  miner exploits some mathematical results about those classes’ evolution to efficiently update both FCIs and FGIs. Thus, targeting a sliding window mode—where the window over a stream expands and shrinks—we enhance results from formal concept analysis to design an efficient expansion procedure. On window shrinking, we exploit some thoroughly new results about class evolution. Overall, FGC - Stream  achieves significant effort factoring through the collaborative maintenance of FCIs and FGIs. As a result, when confronted experimentally, it managed to largely outperform its unique FGI mining competitor while keeping up with two of the most efficient FCI miners. This outcome confirms that FGC - Stream  will dominate any combination of miners for the joint task. This article is an extended version of our paper [ 27 ] presented at the 21st International Conference on Data Mining.
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-023-01852-3