DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets

► A new approach is developed for mining frequent closed itemsets. ► The Dynamic Bit-Vector (DBV) approach is presented. ► Algorithms for fast computing the intersection between two DBVs are proposed. ► A lookup table is used for computing the support of itemsets. ► An algorithm based on DBV and sub...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2012-06, Vol.39 (8), p.7196-7206
Hauptverfasser: Vo, Bay, Hong, Tzung-Pei, Le, Bac
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► A new approach is developed for mining frequent closed itemsets. ► The Dynamic Bit-Vector (DBV) approach is presented. ► Algorithms for fast computing the intersection between two DBVs are proposed. ► A lookup table is used for computing the support of itemsets. ► An algorithm based on DBV and subsumption concept for mining frequent closed itemsets is proposed. Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases once and compute the support of itemsets fast. Recent years, BitTable (Dong & Han, 2007) and IndexBitTable (Song, Yang, & Xu, 2008) approaches have been applied for mining frequent itemsets and results are significant. However, they always use a fixed size of Bit-Vector for each item (equal to number of transactions in a database). It leads to consume more memory for storage Bit-Vectors and the time for computing the intersection among Bit-Vectors. Besides, they only apply for mining frequent itemsets, algorithm for mining FCI based on BitTable is not proposed. This paper introduces a new method for mining FCI from transaction databases. Firstly, Dynamic Bit-Vector (DBV) approach will be presented and algorithms for fast computing the intersection between two DBVs are also proposed. Lookup table is used for fast computing the support (number of bits 1 in a DBV) of itemsets. Next, subsumption concept for memory and computing time saving will be discussed. Finally, an algorithm based on DBV and subsumption concept for mining frequent closed itemsets fast is proposed. We compare our method with CHARM, and recognize that the proposed algorithm is more efficient than CHARM in both the mining time and the memory usage.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2012.01.062