Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Annals of operations research 2018-04, Vol.263 (1-2), p.385-404
Hauptverfasser: Trapp, Andrew C., Li, Chao, Flaherty, Patrick
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 404
container_issue 1-2
container_start_page 385
container_title Annals of operations research
container_volume 263
creator Trapp, Andrew C.
Li, Chao
Flaherty, Patrick
description Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003 ) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P -hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.
doi_str_mv 10.1007/s10479-016-2173-9
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2015610444</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A531094324</galeid><sourcerecordid>A531094324</sourcerecordid><originalsourceid>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</originalsourceid><addsrcrecordid>eNp1kUtr3TAQRk1pobdJfkB3hm7rdPS-7i6EviBQKOk2QpZHjoIt3Wrs9PHra3MLaaBFMAJxzgyar6peMjhnAOYNMZCmbYDphjMjmvZJtWPK8KYVYv-02gFXslFCwPPqBdEdADC2V7vq5gv6fI8lpqF241gPmLC4Mf7Cvs6lx9IcChKW-w2gpZvcXKJHelsn_F7jD-fnOuQyLaObY05Uu9SvjYZc4nw70Wn1LLiR8OzPfVJ9ff_u-vJjc_X5w6fLi6vGSw5zoyEY2WmlNXdSm95ptxcIhvtOct56L42WfVC8AyeN4K5TEHQIHXSs12wvTqpXx76Hkr8tSLO9y0tJ60jLgSm9bkfKB2pwI9qYQp6L81Mkby-UYNBKwTfq_B_Uenqcos8JQ1zfHwmv_xK6hWJCWgvF4XamwS1Ej3F2xH3JRAWDPZQ4ufLTMrBbmPYYpl3DtFuYtl0dfnTosEWF5eF__5d-A7ltoPc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015610444</pqid></control><display><type>article</type><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><source>SpringerLink Journals</source><source>EBSCOhost Business Source Complete</source><creator>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</creator><creatorcontrib>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</creatorcontrib><description>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003 ) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P -hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</description><identifier>ISSN: 0254-5330</identifier><identifier>EISSN: 1572-9338</identifier><identifier>DOI: 10.1007/s10479-016-2173-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Biological activity ; Business and Management ; Cluster analysis ; Clusters ; Combinatorics ; Data mining ; Data Mining and Analytics ; Economic conditions ; Formulations ; Gene expression ; Genes ; Integer programming ; Mathematical analysis ; Mathematical models ; Mathematical programming ; Matrices (Mathematics) ; Matrix methods ; Operations research ; Operations Research/Decision Theory ; Recommender systems ; Studies ; Theory of Computation</subject><ispartof>Annals of operations research, 2018-04, Vol.263 (1-2), p.385-404</ispartof><rights>Springer Science+Business Media New York 2016</rights><rights>COPYRIGHT 2018 Springer</rights><rights>Annals of Operations Research is a copyright of Springer, (2016). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</citedby><cites>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10479-016-2173-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10479-016-2173-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Trapp, Andrew C.</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Flaherty, Patrick</creatorcontrib><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><title>Annals of operations research</title><addtitle>Ann Oper Res</addtitle><description>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003 ) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P -hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</description><subject>Algorithms</subject><subject>Biological activity</subject><subject>Business and Management</subject><subject>Cluster analysis</subject><subject>Clusters</subject><subject>Combinatorics</subject><subject>Data mining</subject><subject>Data Mining and Analytics</subject><subject>Economic conditions</subject><subject>Formulations</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Integer programming</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Mathematical programming</subject><subject>Matrices (Mathematics)</subject><subject>Matrix methods</subject><subject>Operations research</subject><subject>Operations Research/Decision Theory</subject><subject>Recommender systems</subject><subject>Studies</subject><subject>Theory of Computation</subject><issn>0254-5330</issn><issn>1572-9338</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>N95</sourceid><sourceid>BENPR</sourceid><recordid>eNp1kUtr3TAQRk1pobdJfkB3hm7rdPS-7i6EviBQKOk2QpZHjoIt3Wrs9PHra3MLaaBFMAJxzgyar6peMjhnAOYNMZCmbYDphjMjmvZJtWPK8KYVYv-02gFXslFCwPPqBdEdADC2V7vq5gv6fI8lpqF241gPmLC4Mf7Cvs6lx9IcChKW-w2gpZvcXKJHelsn_F7jD-fnOuQyLaObY05Uu9SvjYZc4nw70Wn1LLiR8OzPfVJ9ff_u-vJjc_X5w6fLi6vGSw5zoyEY2WmlNXdSm95ptxcIhvtOct56L42WfVC8AyeN4K5TEHQIHXSs12wvTqpXx76Hkr8tSLO9y0tJ60jLgSm9bkfKB2pwI9qYQp6L81Mkby-UYNBKwTfq_B_Uenqcos8JQ1zfHwmv_xK6hWJCWgvF4XamwS1Ej3F2xH3JRAWDPZQ4ufLTMrBbmPYYpl3DtFuYtl0dfnTosEWF5eF__5d-A7ltoPc</recordid><startdate>20180401</startdate><enddate>20180401</enddate><creator>Trapp, Andrew C.</creator><creator>Li, Chao</creator><creator>Flaherty, Patrick</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>3V.</scope><scope>7TA</scope><scope>7TB</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>KR7</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope></search><sort><creationdate>20180401</creationdate><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><author>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Biological activity</topic><topic>Business and Management</topic><topic>Cluster analysis</topic><topic>Clusters</topic><topic>Combinatorics</topic><topic>Data mining</topic><topic>Data Mining and Analytics</topic><topic>Economic conditions</topic><topic>Formulations</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Integer programming</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Mathematical programming</topic><topic>Matrices (Mathematics)</topic><topic>Matrix methods</topic><topic>Operations research</topic><topic>Operations Research/Decision Theory</topic><topic>Recommender systems</topic><topic>Studies</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Trapp, Andrew C.</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Flaherty, Patrick</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>ProQuest Central (Corporate)</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Annals of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trapp, Andrew C.</au><au>Li, Chao</au><au>Flaherty, Patrick</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</atitle><jtitle>Annals of operations research</jtitle><stitle>Ann Oper Res</stitle><date>2018-04-01</date><risdate>2018</risdate><volume>263</volume><issue>1-2</issue><spage>385</spage><epage>404</epage><pages>385-404</pages><issn>0254-5330</issn><eissn>1572-9338</eissn><abstract>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003 ) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P -hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10479-016-2173-9</doi><tpages>20</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0254-5330
ispartof Annals of operations research, 2018-04, Vol.263 (1-2), p.385-404
issn 0254-5330
1572-9338
language eng
recordid cdi_proquest_journals_2015610444
source SpringerLink Journals; EBSCOhost Business Source Complete
subjects Algorithms
Biological activity
Business and Management
Cluster analysis
Clusters
Combinatorics
Data mining
Data Mining and Analytics
Economic conditions
Formulations
Gene expression
Genes
Integer programming
Mathematical analysis
Mathematical models
Mathematical programming
Matrices (Mathematics)
Matrix methods
Operations research
Operations Research/Decision Theory
Recommender systems
Studies
Theory of Computation
title Recovering all generalized order-preserving submatrices: new exact formulations and algorithms
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T07%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recovering%20all%20generalized%20order-preserving%20submatrices:%20new%20exact%20formulations%20and%20algorithms&rft.jtitle=Annals%20of%20operations%20research&rft.au=Trapp,%20Andrew%20C.&rft.date=2018-04-01&rft.volume=263&rft.issue=1-2&rft.spage=385&rft.epage=404&rft.pages=385-404&rft.issn=0254-5330&rft.eissn=1572-9338&rft_id=info:doi/10.1007/s10479-016-2173-9&rft_dat=%3Cgale_proqu%3EA531094324%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2015610444&rft_id=info:pmid/&rft_galeid=A531094324&rfr_iscdi=true