Mining frequent generators and closures in data streams with FGC-Stream
Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs...
Gespeichert in:
Veröffentlicht in: | Knowledge and information systems 2023-08, Vol.65 (8), p.3295-3335 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mining frequent itemsets (FIs) from data streams is a challenging task due to the limited resources available w.r.t. the typically large size of the result and the need for frequent recalculations due to data evolution. Therefore, the mining of condensed representations, e.g. frequent closures (FCIs) or generators (FGIs), instead of plain FIs, has been explored. So far the tasks of mining FGIs and FCIs have only been addressed separately over data streams. Yet, both itemset families combine in the solutions of a range of practical problems while they also underlie the definition of handy association rule bases. To date, the joint mining task can only be approached by a combining two dedicated miners. As a remedy, we propose a
holistic
approach rooted in the support set-based equivalence classes underlying a transaction dataset: the ensuing
FGC
-
Stream
miner exploits some mathematical results about those classes’ evolution to efficiently update both FCIs and FGIs. Thus, targeting a sliding window mode—where the window over a stream expands and shrinks—we enhance results from formal concept analysis to design an efficient expansion procedure. On window shrinking, we exploit some thoroughly new results about class evolution. Overall,
FGC
-
Stream
achieves significant effort factoring through the collaborative maintenance of FCIs and FGIs. As a result, when confronted experimentally, it managed to largely outperform its unique FGI mining competitor while keeping up with two of the most efficient FCI miners. This outcome confirms that
FGC
-
Stream
will dominate
any
combination of miners for the joint task. This article is an extended version of our paper [
27
] presented at the 21st International Conference on Data Mining. |
---|---|
ISSN: | 0219-1377 0219-3116 |
DOI: | 10.1007/s10115-023-01852-3 |