Efficiently mining frequent itemsets with weight and recency constraints

In the past, a novel framework named recent weighted frequent itemset mining (RWFIM) and two projection-based algorithms, RWFIM-P and RWFIM-PE, were proposed to consider both the relative importance of items (item weights) and the recency of patterns. However, the projection-and-test mechanism used...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2017-10, Vol.47 (3), p.769-792
Hauptverfasser: Lin, Jerry Chun-Wei, Gan, Wensheng, Fournier-Viger, Philippe, Chao, Han-Chieh, Hong, Tzung-Pei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the past, a novel framework named recent weighted frequent itemset mining (RWFIM) and two projection-based algorithms, RWFIM-P and RWFIM-PE, were proposed to consider both the relative importance of items (item weights) and the recency of patterns. However, the projection-and-test mechanism used by these algorithms to discover recent weighted frequent itemsets (RWFIs) in a recursive way may have poor performance when the database is dense or contains long transactions. To address this issue, an efficient tree-based RWFI-Mine algorithm is proposed in this paper for mining RWFIs, which considers both weight and the recency of patterns. A novel Set-enumeration tree called the recent weighted frequent (RWF)-tree and a sorted downward closure property of RWFIs for the RWF-tree are proposed. Moreover, two data structures, named element (E)-table and recent weighted frequent (RWF)-table, are designed to store the information needed for discovering RWFIs. RFWI-Mine discovers RWFIs in a recursive way without candidate generation, thus reducing the computational costs and memory requirements for mining RWFIs. A second improved algorithm named RWFI-EMine algorithm is further proposed to avoid building E-tables and RWF-tables for unpromising itemsets and their child nodes by adopting the Estimated Weight of 2-itemset Pruning (EW2P) strategy. Extensive experiments are conducted on several real-world and synthetic datasets to evaluate the performance of the two proposed algorithms, and the ratio between the number of generated RWFIs and WFIs. Results show that the proposed algorithms outperform not only the traditional PWA algorithm for WFIM, but also the state-of-the-art RWFIM-P and RWFIM-PE algorithms for RWFIM, in terms of runtime, memory usage and scalability.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-017-0915-2