Approximating concept stability using variance reduction techniques

Finding actionable patterns from massive data is a crucial task in data mining applications. In Formal Concept Analysis (FCA), concept stability is one of the commonly used measures for assessing the interestingness of formal concepts, and hence selecting the relevant patterns of such a type. Accura...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Discrete Applied Mathematics 2020-02, Vol.273, p.117-135
Hauptverfasser: Ibrahim, Mohamed-Hamza, Missaoui, Rokia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Finding actionable patterns from massive data is a crucial task in data mining applications. In Formal Concept Analysis (FCA), concept stability is one of the commonly used measures for assessing the interestingness of formal concepts, and hence selecting the relevant patterns of such a type. Accurate and scalable computation of concept stability often remains a key challenge. While exact methods for computing stability can be an effective solution in small applications, their algorithmic complexity is however at least quadratic in the size of the input lattice which could be exponential with respect to the size of the context. As such, approximation algorithms like Monte Carlo Sampling (MCS) have been applied to be more effective in practice. However, MCS often has a slow convergence problem with an inaccurate estimation of stability. In this paper, we introduce a new set of approximation methods to estimate the stability index based on variance reduction techniques. Specifically, we focus on adapting Latin hypercube (LHS), scrambled Sobol (SoS) and Latin supercube (LSS) sampling methods to exploit the potential of stratification and Low-discrepancy as well as hybridization approaches to improve the convergence rate. In contrast to the pure randomness of MCS, the proposed methods aim to spread the sample points more evenly across all possible subsets of concept intent (or extent). This allows all the areas of the intent powerset space to be properly represented. Our experiments on several formal contexts illustrate the efficiency of LHS, SoS and LSS over MCS.
ISSN:0166-218X
1872-6771
DOI:10.1016/j.dam.2019.03.002