Incremental Top-k High Utility Pattern Mining and Analyzing Over the Entire Accumulated Dynamic Database

Top-k high utility pattern mining, which extracts the highest top-k patterns that the users want to find, has been actively studied. Most previous studies in this domain have focused on static databases, where data insertions do not occur. In the real world, however, various applications continuousl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.77605-77620
Hauptverfasser: Lee, Chanhee, Kim, Hanju, Cho, Myungha, Kim, Hyeonmo, Vo, Bay, Lin, Jerry Chun-Wei, Fournier-Viger, Philippe, Yun, Unil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Top-k high utility pattern mining, which extracts the highest top-k patterns that the users want to find, has been actively studied. Most previous studies in this domain have focused on static databases, where data insertions do not occur. In the real world, however, various applications continuously generate new data, and existing top-k high utility pattern mining algorithms devised to process static databases cannot handle incremental databases. Although some methods can handle stream data, they have the limitation of processing a portion of the database rather than the entire accumulated database. In this paper, we suggest an efficient incremental mining method that discovers top-k high utility patterns from the entire accumulated database. The proposed approach utilizes a list structure that stores minimal utility information required for the mining process and does not generate candidate itemsets. The suggested algorithm processes the incremental data with a single database scan and restructures the list for efficient mining. Moreover, four efficient threshold raising techniques along with a restoring technique are utilized to calculate the optimal threshold value in an accumulated incremental environment. The results of the experiments on runtime, memory, and scalability show that the suggested method efficiently processes the entire incremental database.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3406562