What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data

In previous studies, association rules have been proven to be useful in classification problems over high dimensional gene expression data. However, due to the nature of such data sets, it is often the case that millions of rules can be derived such that many of them are covered by exactly the same...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2011-11, Vol.23 (11), p.1735-1747
Hauptverfasser:	Ruichu Cai, Tung, A. K. H., Zhenjie Zhang, Zhifeng Hao
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Applied sciences Association rules Biological and medical sciences Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Fundamental and applied biological sciences. Psychology Gene expression gene expression data incremental mining framework Itemsets Lattices Memory organisation. Data processing Molecular and cellular biology Molecular genetics robust classification Software Upper bound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In previous studies, association rules have been proven to be useful in classification problems over high dimensional gene expression data. However, due to the nature of such data sets, it is often the case that millions of rules can be derived such that many of them are covered by exactly the same set of training tuples and thus have exactly the same support and confidence. Ranking and selecting useful rules from such equivalent rule groups remain an interesting and unexplored problem. In this paper, we look at two interestingness measures for ranking the interestingness of rules within equivalent rule group: Max-Subrule-Conf and Min-Subrule-Conf. Based on these interestingness measures, an incremental Apriori-like algorithm is designed to select more interesting rules from the lower bound rules of the group. Moreover, we present an improved classification model to fully exploit the potential of the selected rules. Our empirical studies on our proposed methods over five gene expression data sets show that our proposals improve both the efficiency and effectiveness of the rule extraction and classifier construction over gene expression data sets.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2010.207