Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Annals of operations research 2018-04, Vol.263 (1-2), p.385-404
Hauptverfasser:	Trapp, Andrew C., Li, Chao, Flaherty, Patrick
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Biological activity Business and Management Cluster analysis Clusters Combinatorics Data mining Data Mining and Analytics Economic conditions Formulations Gene expression Genes Integer programming Mathematical analysis Mathematical models Mathematical programming Matrices (Mathematics) Matrix methods Operations research Operations Research/Decision Theory Recommender systems Studies Theory of Computation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003 ) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P -hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.
ISSN:	0254-5330 1572-9338
DOI:	10.1007/s10479-016-2173-9