TRICE: Mining Frequent Itemsets by Iterative TRimmed Transaction LattICE in Sparse Big Data

Sparseness is often witnessed in big data emanating from a variety of sources, including IoT, pervasive computing, and behavioral data. Frequent itemset mining is the first and foremost step of association rule mining, which is a distinguished unsupervised machine learning problem. However, techniqu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.181688-181705
Hauptverfasser: Yasir, Muhammad, Habib, Muhammad Asif, Ashraf, Muhammad, Sarwar, Shahzad, Chaudhry, Muhammad Umar, Shahwani, Hamayoun, Ahmad, Mudassar, Muhammad Nadeem Faisal, Ch
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Sparseness is often witnessed in big data emanating from a variety of sources, including IoT, pervasive computing, and behavioral data. Frequent itemset mining is the first and foremost step of association rule mining, which is a distinguished unsupervised machine learning problem. However, techniques for frequent itemset mining are least explored for sparse real-world data, showing somewhat comparable performance. On the contrary, the methods are adequately validated for dense data and stand apart from each other in terms of performance. Hence, there arises an immense need for evaluating these techniques as well as proposing new ones for large sparse real-world datasets. In this study, a novel method: Mining Frequent Itemsets by Iterative TRimmed Transaction lattICE (TRICE) is proposed. TRICE iteratively generates combinations of varying-sized trimmed subsets of I, where I denote the set of distinct items in a database. Extensive experiments are conducted to assess TRICE against HARPP, FP-Growth, optimized SaM, and optimized RElim algorithms. The experimental results show that TRICE outperforms all these algorithms both in terms of running time and memory consumption. TRICE maintains a substantial performance gap for all sparse real-world datasets on all minimum support thresholds. Moreover, assessment of memory use of optimized SaM and RElim algorithms has been performed for the first time.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2959878