Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map

Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2021-09, Vol.37 (18), p.2889-2895
Hauptverfasser: Blasco, Andrea, Natoli, Ted, Endres, Michael G, Sergeev, Rinat A, Randazzo, Steven, Paik, Jin H, Macaluso, N J Maximilian, Narayan, Rajiv, Lu, Xiaodong, Peck, David, Lakhani, Karim R, Subramanian, Aravind
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2895
container_issue 18
container_start_page 2889
container_title Bioinformatics (Oxford, England)
container_volume 37
creator Blasco, Andrea
Natoli, Ted
Endres, Michael G
Sergeev, Rinat A
Randazzo, Steven
Paik, Jin H
Macaluso, N J Maximilian
Narayan, Rajiv
Lu, Xiaodong
Peck, David
Lakhani, Karim R
Subramanian, Aravind
description Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btab192
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8479655</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2509606303</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-d5783dd8d30926db61b0ad3197e1199f4aba91d4fbc3ae0121c51443a2a8307f3</originalsourceid><addsrcrecordid>eNpVUUtP3DAQtipQef4F5GMvC3acODEHpAq1gITUCz1bE9vZdZV4UtsJ2n9P6C6rcprRfK-RPkKuOLvmTImb1qMPHcYBsjfpps3QclV8IadcyHpVNpwfHXYmTshZSn8YYxWr5FdyIkRTlKoqT8nr0zBGnH1YU-sMhhn7KXsMdHB5gzZRH-iS1eN6S_Mm4rTeUBxdWO4BZ_hHNTiMLvv3Pd1SCBTGsfdmB2ZcdG7hhOBM9rPPWzrAeEGOO-iTu9zPc_L754-X-8fV86-Hp_vvzysjJM8rW9WNsLaxgqlC2lbyloEVXNWOc6W6ElpQ3JZdawQ4xgtuKl6WAgpoBKs7cU7udr7j1A7OGhdyhF6P0Q8QtxrB689I8Bu9xlk3Za1kVS0G3_YGEf9OLmU9-GRc30NwOCVdVExJJgUTC1XuqCZiStF1hxjO9Htr-nNret_aIrz6_8mD7KMm8QZY9J6S</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2509606303</pqid></control><display><type>article</type><title>Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Blasco, Andrea ; Natoli, Ted ; Endres, Michael G ; Sergeev, Rinat A ; Randazzo, Steven ; Paik, Jin H ; Macaluso, N J Maximilian ; Narayan, Rajiv ; Lu, Xiaodong ; Peck, David ; Lakhani, Karim R ; Subramanian, Aravind</creator><contributor>Mathelier, Anthony</contributor><creatorcontrib>Blasco, Andrea ; Natoli, Ted ; Endres, Michael G ; Sergeev, Rinat A ; Randazzo, Steven ; Paik, Jin H ; Macaluso, N J Maximilian ; Narayan, Rajiv ; Lu, Xiaodong ; Peck, David ; Lakhani, Karim R ; Subramanian, Aravind ; Mathelier, Anthony</creatorcontrib><description>Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btab192</identifier><identifier>PMID: 33824954</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Biology ; Original Papers ; Random Forest ; Reproducibility of Results ; Software</subject><ispartof>Bioinformatics (Oxford, England), 2021-09, Vol.37 (18), p.2889-2895</ispartof><rights>The Author(s) 2021. Published by Oxford University Press.</rights><rights>The Author(s) 2021. Published by Oxford University Press. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c361t-d5783dd8d30926db61b0ad3197e1199f4aba91d4fbc3ae0121c51443a2a8307f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479655/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479655/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33824954$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Mathelier, Anthony</contributor><creatorcontrib>Blasco, Andrea</creatorcontrib><creatorcontrib>Natoli, Ted</creatorcontrib><creatorcontrib>Endres, Michael G</creatorcontrib><creatorcontrib>Sergeev, Rinat A</creatorcontrib><creatorcontrib>Randazzo, Steven</creatorcontrib><creatorcontrib>Paik, Jin H</creatorcontrib><creatorcontrib>Macaluso, N J Maximilian</creatorcontrib><creatorcontrib>Narayan, Rajiv</creatorcontrib><creatorcontrib>Lu, Xiaodong</creatorcontrib><creatorcontrib>Peck, David</creatorcontrib><creatorcontrib>Lakhani, Karim R</creatorcontrib><creatorcontrib>Subramanian, Aravind</creatorcontrib><title>Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Biology</subject><subject>Original Papers</subject><subject>Random Forest</subject><subject>Reproducibility of Results</subject><subject>Software</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVUUtP3DAQtipQef4F5GMvC3acODEHpAq1gITUCz1bE9vZdZV4UtsJ2n9P6C6rcprRfK-RPkKuOLvmTImb1qMPHcYBsjfpps3QclV8IadcyHpVNpwfHXYmTshZSn8YYxWr5FdyIkRTlKoqT8nr0zBGnH1YU-sMhhn7KXsMdHB5gzZRH-iS1eN6S_Mm4rTeUBxdWO4BZ_hHNTiMLvv3Pd1SCBTGsfdmB2ZcdG7hhOBM9rPPWzrAeEGOO-iTu9zPc_L754-X-8fV86-Hp_vvzysjJM8rW9WNsLaxgqlC2lbyloEVXNWOc6W6ElpQ3JZdawQ4xgtuKl6WAgpoBKs7cU7udr7j1A7OGhdyhF6P0Q8QtxrB689I8Bu9xlk3Za1kVS0G3_YGEf9OLmU9-GRc30NwOCVdVExJJgUTC1XuqCZiStF1hxjO9Htr-nNret_aIrz6_8mD7KMm8QZY9J6S</recordid><startdate>20210929</startdate><enddate>20210929</enddate><creator>Blasco, Andrea</creator><creator>Natoli, Ted</creator><creator>Endres, Michael G</creator><creator>Sergeev, Rinat A</creator><creator>Randazzo, Steven</creator><creator>Paik, Jin H</creator><creator>Macaluso, N J Maximilian</creator><creator>Narayan, Rajiv</creator><creator>Lu, Xiaodong</creator><creator>Peck, David</creator><creator>Lakhani, Karim R</creator><creator>Subramanian, Aravind</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20210929</creationdate><title>Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map</title><author>Blasco, Andrea ; Natoli, Ted ; Endres, Michael G ; Sergeev, Rinat A ; Randazzo, Steven ; Paik, Jin H ; Macaluso, N J Maximilian ; Narayan, Rajiv ; Lu, Xiaodong ; Peck, David ; Lakhani, Karim R ; Subramanian, Aravind</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-d5783dd8d30926db61b0ad3197e1199f4aba91d4fbc3ae0121c51443a2a8307f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Biology</topic><topic>Original Papers</topic><topic>Random Forest</topic><topic>Reproducibility of Results</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Blasco, Andrea</creatorcontrib><creatorcontrib>Natoli, Ted</creatorcontrib><creatorcontrib>Endres, Michael G</creatorcontrib><creatorcontrib>Sergeev, Rinat A</creatorcontrib><creatorcontrib>Randazzo, Steven</creatorcontrib><creatorcontrib>Paik, Jin H</creatorcontrib><creatorcontrib>Macaluso, N J Maximilian</creatorcontrib><creatorcontrib>Narayan, Rajiv</creatorcontrib><creatorcontrib>Lu, Xiaodong</creatorcontrib><creatorcontrib>Peck, David</creatorcontrib><creatorcontrib>Lakhani, Karim R</creatorcontrib><creatorcontrib>Subramanian, Aravind</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Blasco, Andrea</au><au>Natoli, Ted</au><au>Endres, Michael G</au><au>Sergeev, Rinat A</au><au>Randazzo, Steven</au><au>Paik, Jin H</au><au>Macaluso, N J Maximilian</au><au>Narayan, Rajiv</au><au>Lu, Xiaodong</au><au>Peck, David</au><au>Lakhani, Karim R</au><au>Subramanian, Aravind</au><au>Mathelier, Anthony</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2021-09-29</date><risdate>2021</risdate><volume>37</volume><issue>18</issue><spage>2889</spage><epage>2895</epage><pages>2889-2895</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>33824954</pmid><doi>10.1093/bioinformatics/btab192</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics (Oxford, England), 2021-09, Vol.37 (18), p.2889-2895
issn 1367-4803
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8479655
source MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects Algorithms
Biology
Original Papers
Random Forest
Reproducibility of Results
Software
title Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T09%3A52%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20deconvolution%20methods%20in%20biology%20through%20open%20innovation%20competitions:%20an%20application%20to%20the%20connectivity%20map&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Blasco,%20Andrea&rft.date=2021-09-29&rft.volume=37&rft.issue=18&rft.spage=2889&rft.epage=2895&rft.pages=2889-2895&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btab192&rft_dat=%3Cproquest_pubme%3E2509606303%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2509606303&rft_id=info:pmid/33824954&rfr_iscdi=true