SAT‐based and CP‐based declarative approaches for Top‐Rank‐K closed frequent itemset mining
Top‐Rank‐K Frequent Itemset (or Pattern) Mining (FPM) is an important data mining task, where user decides on the number of top frequency ranks of patterns (itemsets) they want to mine from a transactional dataset. This problem does not require the minimum support threshold parameter that is typical...
Gespeichert in:
Veröffentlicht in: | International journal of intelligent systems 2021-01, Vol.36 (1), p.112-151 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Top‐Rank‐K Frequent Itemset (or Pattern) Mining (FPM) is an important data mining task, where user decides on the number of top frequency ranks of patterns (itemsets) they want to mine from a transactional dataset. This problem does not require the minimum support threshold parameter that is typically used in FPM problems. Rather, the algorithms solving the Top‐Rank‐K FPM problem are fed with
K, the number of frequency ranks of itemsets required, to compute the threshold internally. This paper presents two declarative approaches to tackle the Top‐Rank‐K Closed FPM problem. The first approach is Boolean Satisfiability‐based (SAT‐based) where we propose an effective encoding for the problem along with an efficient algorithm employing this encoding. The second approach is CP‐based, that is, utilizes Constraint Programming technique, where a simple CP model is exploited in an innovative manner to mine the Top‐Rank‐K Closed FPM itemsets from transactional datasets. Both approaches are evaluated experimentally against other declarative and imperative algorithms. The proposed SAT‐based approach significantly outperforms IM, another SAT‐based approach, and outperforms the proposed CP‐approach for sparse and moderate datasets, whereas the latter excels on dense datasets. An extensive study has been conducted to assess the proposed approaches in terms of their feasibility, performance factors, and practicality of use. |
---|---|
ISSN: | 0884-8173 1098-111X |
DOI: | 10.1002/int.22294 |