Finding efficiencies in frequent pattern mining from big uncertain data

Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	World wide web (Bussum) 2017-05, Vol.20 (3), p.571-594
Hauptverfasser:	Leung, Carson Kai-Sang, MacKinnon, Richard Kyle, Jiang, Fan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science Constraint modelling Data management Data mining Database Management Efficiency Information Systems Applications (incl.Internet) Operating Systems Searching User satisfaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns—out of which only some are interesting. In this article, we propose an algorithm that allows users to express their interest in terms of constraints, uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified anti-monotone and monotone constraints, as well as balance the load.
ISSN:	1386-145X 1573-1413
DOI:	10.1007/s11280-016-0411-3