Recovering all generalized order-preserving submatrices: new exact formulations and algorithms
Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a s...
Gespeichert in:
Veröffentlicht in: | Annals of operations research 2018-04, Vol.263 (1-2), p.385-404 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 404 |
---|---|
container_issue | 1-2 |
container_start_page | 385 |
container_title | Annals of operations research |
container_volume | 263 |
creator | Trapp, Andrew C. Li, Chao Flaherty, Patrick |
description | Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384,
2003
) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is
N
P
-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the
generalized
OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments. |
doi_str_mv | 10.1007/s10479-016-2173-9 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2015610444</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A531094324</galeid><sourcerecordid>A531094324</sourcerecordid><originalsourceid>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</originalsourceid><addsrcrecordid>eNp1kUtr3TAQRk1pobdJfkB3hm7rdPS-7i6EviBQKOk2QpZHjoIt3Wrs9PHra3MLaaBFMAJxzgyar6peMjhnAOYNMZCmbYDphjMjmvZJtWPK8KYVYv-02gFXslFCwPPqBdEdADC2V7vq5gv6fI8lpqF241gPmLC4Mf7Cvs6lx9IcChKW-w2gpZvcXKJHelsn_F7jD-fnOuQyLaObY05Uu9SvjYZc4nw70Wn1LLiR8OzPfVJ9ff_u-vJjc_X5w6fLi6vGSw5zoyEY2WmlNXdSm95ptxcIhvtOct56L42WfVC8AyeN4K5TEHQIHXSs12wvTqpXx76Hkr8tSLO9y0tJ60jLgSm9bkfKB2pwI9qYQp6L81Mkby-UYNBKwTfq_B_Uenqcos8JQ1zfHwmv_xK6hWJCWgvF4XamwS1Ej3F2xH3JRAWDPZQ4ufLTMrBbmPYYpl3DtFuYtl0dfnTosEWF5eF__5d-A7ltoPc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015610444</pqid></control><display><type>article</type><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><source>SpringerLink Journals</source><source>EBSCOhost Business Source Complete</source><creator>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</creator><creatorcontrib>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</creatorcontrib><description>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384,
2003
) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is
N
P
-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the
generalized
OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</description><identifier>ISSN: 0254-5330</identifier><identifier>EISSN: 1572-9338</identifier><identifier>DOI: 10.1007/s10479-016-2173-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Biological activity ; Business and Management ; Cluster analysis ; Clusters ; Combinatorics ; Data mining ; Data Mining and Analytics ; Economic conditions ; Formulations ; Gene expression ; Genes ; Integer programming ; Mathematical analysis ; Mathematical models ; Mathematical programming ; Matrices (Mathematics) ; Matrix methods ; Operations research ; Operations Research/Decision Theory ; Recommender systems ; Studies ; Theory of Computation</subject><ispartof>Annals of operations research, 2018-04, Vol.263 (1-2), p.385-404</ispartof><rights>Springer Science+Business Media New York 2016</rights><rights>COPYRIGHT 2018 Springer</rights><rights>Annals of Operations Research is a copyright of Springer, (2016). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</citedby><cites>FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10479-016-2173-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10479-016-2173-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Trapp, Andrew C.</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Flaherty, Patrick</creatorcontrib><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><title>Annals of operations research</title><addtitle>Ann Oper Res</addtitle><description>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384,
2003
) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is
N
P
-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the
generalized
OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</description><subject>Algorithms</subject><subject>Biological activity</subject><subject>Business and Management</subject><subject>Cluster analysis</subject><subject>Clusters</subject><subject>Combinatorics</subject><subject>Data mining</subject><subject>Data Mining and Analytics</subject><subject>Economic conditions</subject><subject>Formulations</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Integer programming</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Mathematical programming</subject><subject>Matrices (Mathematics)</subject><subject>Matrix methods</subject><subject>Operations research</subject><subject>Operations Research/Decision Theory</subject><subject>Recommender systems</subject><subject>Studies</subject><subject>Theory of Computation</subject><issn>0254-5330</issn><issn>1572-9338</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>N95</sourceid><sourceid>BENPR</sourceid><recordid>eNp1kUtr3TAQRk1pobdJfkB3hm7rdPS-7i6EviBQKOk2QpZHjoIt3Wrs9PHra3MLaaBFMAJxzgyar6peMjhnAOYNMZCmbYDphjMjmvZJtWPK8KYVYv-02gFXslFCwPPqBdEdADC2V7vq5gv6fI8lpqF241gPmLC4Mf7Cvs6lx9IcChKW-w2gpZvcXKJHelsn_F7jD-fnOuQyLaObY05Uu9SvjYZc4nw70Wn1LLiR8OzPfVJ9ff_u-vJjc_X5w6fLi6vGSw5zoyEY2WmlNXdSm95ptxcIhvtOct56L42WfVC8AyeN4K5TEHQIHXSs12wvTqpXx76Hkr8tSLO9y0tJ60jLgSm9bkfKB2pwI9qYQp6L81Mkby-UYNBKwTfq_B_Uenqcos8JQ1zfHwmv_xK6hWJCWgvF4XamwS1Ej3F2xH3JRAWDPZQ4ufLTMrBbmPYYpl3DtFuYtl0dfnTosEWF5eF__5d-A7ltoPc</recordid><startdate>20180401</startdate><enddate>20180401</enddate><creator>Trapp, Andrew C.</creator><creator>Li, Chao</creator><creator>Flaherty, Patrick</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>3V.</scope><scope>7TA</scope><scope>7TB</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>KR7</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope></search><sort><creationdate>20180401</creationdate><title>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</title><author>Trapp, Andrew C. ; Li, Chao ; Flaherty, Patrick</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c420t-60f74b65662a467da6a83e072cb4229cc4764df52b0a4732ab50f6ffb0b1d6183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Biological activity</topic><topic>Business and Management</topic><topic>Cluster analysis</topic><topic>Clusters</topic><topic>Combinatorics</topic><topic>Data mining</topic><topic>Data Mining and Analytics</topic><topic>Economic conditions</topic><topic>Formulations</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Integer programming</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Mathematical programming</topic><topic>Matrices (Mathematics)</topic><topic>Matrix methods</topic><topic>Operations research</topic><topic>Operations Research/Decision Theory</topic><topic>Recommender systems</topic><topic>Studies</topic><topic>Theory of Computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Trapp, Andrew C.</creatorcontrib><creatorcontrib>Li, Chao</creatorcontrib><creatorcontrib>Flaherty, Patrick</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>ProQuest Central (Corporate)</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Annals of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trapp, Andrew C.</au><au>Li, Chao</au><au>Flaherty, Patrick</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recovering all generalized order-preserving submatrices: new exact formulations and algorithms</atitle><jtitle>Annals of operations research</jtitle><stitle>Ann Oper Res</stitle><date>2018-04-01</date><risdate>2018</risdate><volume>263</volume><issue>1-2</issue><spage>385</spage><epage>404</epage><pages>385-404</pages><issn>0254-5330</issn><eissn>1572-9338</eissn><abstract>Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384,
2003
) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is
N
P
-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the
generalized
OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10479-016-2173-9</doi><tpages>20</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0254-5330 |
ispartof | Annals of operations research, 2018-04, Vol.263 (1-2), p.385-404 |
issn | 0254-5330 1572-9338 |
language | eng |
recordid | cdi_proquest_journals_2015610444 |
source | SpringerLink Journals; EBSCOhost Business Source Complete |
subjects | Algorithms Biological activity Business and Management Cluster analysis Clusters Combinatorics Data mining Data Mining and Analytics Economic conditions Formulations Gene expression Genes Integer programming Mathematical analysis Mathematical models Mathematical programming Matrices (Mathematics) Matrix methods Operations research Operations Research/Decision Theory Recommender systems Studies Theory of Computation |
title | Recovering all generalized order-preserving submatrices: new exact formulations and algorithms |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T07%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recovering%20all%20generalized%20order-preserving%20submatrices:%20new%20exact%20formulations%20and%20algorithms&rft.jtitle=Annals%20of%20operations%20research&rft.au=Trapp,%20Andrew%20C.&rft.date=2018-04-01&rft.volume=263&rft.issue=1-2&rft.spage=385&rft.epage=404&rft.pages=385-404&rft.issn=0254-5330&rft.eissn=1572-9338&rft_id=info:doi/10.1007/s10479-016-2173-9&rft_dat=%3Cgale_proqu%3EA531094324%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2015610444&rft_id=info:pmid/&rft_galeid=A531094324&rfr_iscdi=true |