A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences
Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in...
Gespeichert in:
Veröffentlicht in: | INFORMS journal on computing 2014-03, Vol.26 (2), p.208-221 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in the form of conjunction of feature-value pairs or items. In this paper, we incorporate absolute difference, relative difference, and statistical significance in our definition of a group difference, and develop a novel method named DIFF that uses the prefix-tree structure to compress the search space, follows a tree traversal procedure to discover the complete set of significant group differences, and employs efficient pruning strategies to expedite the search process. We conducted comprehensive experiments to compare our method with existing methods on completeness of results, pruning efficiency, and computational efficiency. The experiments demonstrate that our method guarantees completeness of results and achieves higher pruning efficiency and computational efficiency compared to STUCCO. In addition, our definition of group difference is more general than STUCCO. Our method is more effective than traditional approaches, such as classification trees, in discovering the complete set of significant group differences. |
---|---|
ISSN: | 1091-9856 1526-5528 1091-9856 |
DOI: | 10.1287/ijoc.2013.0558 |