Significant Subgraph Mining with Multiple Testing Correction
The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The problem of finding itemsets that are statistically significantly enriched
in a class of transactions is complicated by the need to correct for multiple
hypothesis testing. Pruning untestable hypotheses was recently proposed as a
strategy for this task of significant itemset mining. It was shown to lead to
greater statistical power, the discovery of more truly significant itemsets,
than the standard Bonferroni correction on real-world datasets. An open
question, however, is whether this strategy of excluding untestable hypotheses
also leads to greater statistical power in subgraph mining, in which the number
of hypotheses is much larger than in itemset mining. Here we answer this
question by an empirical investigation on eight popular graph benchmark
datasets. We propose a new efficient search strategy, which always returns the
same solution as the state-of-the-art approach and is approximately two orders
of magnitude faster. Moreover, we exploit the dependence between subgraphs by
considering the effective number of tests and thereby further increase the
statistical power. |
---|---|
DOI: | 10.48550/arxiv.1407.0316 |