Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies
Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in...
Gespeichert in:
Veröffentlicht in: | Journal of chemical information and modeling 2021-04, Vol.61 (4), p.1603-1616 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1616 |
---|---|
container_issue | 4 |
container_start_page | 1603 |
container_title | Journal of chemical information and modeling |
container_volume | 61 |
creator | Martin, Eric J Zhu, Xiang-Wei |
description | Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unli |
doi_str_mv | 10.1021/acs.jcim.0c01342 |
format | Article |
fullrecord | <record><control><sourceid>proquest_acs_j</sourceid><recordid>TN_cdi_acs_journals_10_1021_acs_jcim_0c01342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2524943150</sourcerecordid><originalsourceid>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</originalsourceid><addsrcrecordid>eNqNkc1v1DAQxS0EoqVw54QicUGCLDP-yCbclqi0SAXKl8Qtcpwx8iqJFzsp4r-vt7tbqUhIHGyP5N8bvZnH2FOEBQLH19rExdq4YQEGUEh-jx2jklVeFfDj_qFWVXHEHsW4BhCiKvhDdiREKaXC6pita9_3uvVBT-6Kssvgresp__x19eVNtso-6mkOus8uez1ZH4YsXdnb2fWdG39md7UffEd9zPTgb76GDU1uX-nRUXzMHljdR3qyf0_Y93en3-rz_OLT2ft6dZFrCcWUkwFNyxa6dGSpBSJZq9HqZVmRQgHCFAaLTlfcqk51S0KBLVhLXCklhDhhL3Z9N8H_milOzeCioWR1JD_HhivkgqsCt-jzv9C1n8OY3CWKy0oKVJAo2FEm-BgD2WYT3KDDnwah2ebQpByabQ7NPockebZvPLcDdbeCw-ITUO6A39R6G42j0dAtBgCFlEteFqlCrN2UVuzH2s_jlKQv_1-a6Fc7-sbjYbp_Gr8G-Ge0ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2524943150</pqid></control><display><type>article</type><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><source>ACS Publications</source><source>Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><creator>Martin, Eric J ; Zhu, Xiang-Wei</creator><creatorcontrib>Martin, Eric J ; Zhu, Xiang-Wei</creatorcontrib><description>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.0c01342</identifier><identifier>PMID: 33844519</identifier><language>eng</language><publisher>WASHINGTON: American Chemical Society</publisher><subject>Acquisitions & mergers ; Assaying ; Chemistry ; Chemistry, Medicinal ; Chemistry, Multidisciplinary ; Collaboration ; Computer Science ; Computer Science, Information Systems ; Computer Science, Interdisciplinary Applications ; Life Sciences & Biomedicine ; Machine Learning and Deep Learning ; Modelling ; Pharmacology & Pharmacy ; Physical Sciences ; Science & Technology ; Technology ; Trade secrets</subject><ispartof>Journal of chemical information and modeling, 2021-04, Vol.61 (4), p.1603-1616</ispartof><rights>2021 American Chemical Society</rights><rights>Copyright American Chemical Society Apr 26, 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>10</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000644728600011</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</citedby><cites>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</cites><orcidid>0000-0001-7040-5108 ; 0000-0002-1894-7679</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c01342$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.0c01342$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>315,781,785,2766,27081,27929,27930,39263,56743,56793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33844519$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Martin, Eric J</creatorcontrib><creatorcontrib>Zhu, Xiang-Wei</creatorcontrib><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><title>Journal of chemical information and modeling</title><addtitle>J CHEM INF MODEL</addtitle><addtitle>J. Chem. Inf. Model</addtitle><description>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</description><subject>Acquisitions & mergers</subject><subject>Assaying</subject><subject>Chemistry</subject><subject>Chemistry, Medicinal</subject><subject>Chemistry, Multidisciplinary</subject><subject>Collaboration</subject><subject>Computer Science</subject><subject>Computer Science, Information Systems</subject><subject>Computer Science, Interdisciplinary Applications</subject><subject>Life Sciences & Biomedicine</subject><subject>Machine Learning and Deep Learning</subject><subject>Modelling</subject><subject>Pharmacology & Pharmacy</subject><subject>Physical Sciences</subject><subject>Science & Technology</subject><subject>Technology</subject><subject>Trade secrets</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><recordid>eNqNkc1v1DAQxS0EoqVw54QicUGCLDP-yCbclqi0SAXKl8Qtcpwx8iqJFzsp4r-vt7tbqUhIHGyP5N8bvZnH2FOEBQLH19rExdq4YQEGUEh-jx2jklVeFfDj_qFWVXHEHsW4BhCiKvhDdiREKaXC6pita9_3uvVBT-6Kssvgresp__x19eVNtso-6mkOus8uez1ZH4YsXdnb2fWdG39md7UffEd9zPTgb76GDU1uX-nRUXzMHljdR3qyf0_Y93en3-rz_OLT2ft6dZFrCcWUkwFNyxa6dGSpBSJZq9HqZVmRQgHCFAaLTlfcqk51S0KBLVhLXCklhDhhL3Z9N8H_milOzeCioWR1JD_HhivkgqsCt-jzv9C1n8OY3CWKy0oKVJAo2FEm-BgD2WYT3KDDnwah2ebQpByabQ7NPockebZvPLcDdbeCw-ITUO6A39R6G42j0dAtBgCFlEteFqlCrN2UVuzH2s_jlKQv_1-a6Fc7-sbjYbp_Gr8G-Ge0ug</recordid><startdate>20210426</startdate><enddate>20210426</enddate><creator>Martin, Eric J</creator><creator>Zhu, Xiang-Wei</creator><general>American Chemical Society</general><general>Amer Chemical Soc</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-7040-5108</orcidid><orcidid>https://orcid.org/0000-0002-1894-7679</orcidid></search><sort><creationdate>20210426</creationdate><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><author>Martin, Eric J ; Zhu, Xiang-Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acquisitions & mergers</topic><topic>Assaying</topic><topic>Chemistry</topic><topic>Chemistry, Medicinal</topic><topic>Chemistry, Multidisciplinary</topic><topic>Collaboration</topic><topic>Computer Science</topic><topic>Computer Science, Information Systems</topic><topic>Computer Science, Interdisciplinary Applications</topic><topic>Life Sciences & Biomedicine</topic><topic>Machine Learning and Deep Learning</topic><topic>Modelling</topic><topic>Pharmacology & Pharmacy</topic><topic>Physical Sciences</topic><topic>Science & Technology</topic><topic>Technology</topic><topic>Trade secrets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Martin, Eric J</creatorcontrib><creatorcontrib>Zhu, Xiang-Wei</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Martin, Eric J</au><au>Zhu, Xiang-Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</atitle><jtitle>Journal of chemical information and modeling</jtitle><stitle>J CHEM INF MODEL</stitle><addtitle>J. Chem. Inf. Model</addtitle><date>2021-04-26</date><risdate>2021</risdate><volume>61</volume><issue>4</issue><spage>1603</spage><epage>1616</epage><pages>1603-1616</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</abstract><cop>WASHINGTON</cop><pub>American Chemical Society</pub><pmid>33844519</pmid><doi>10.1021/acs.jcim.0c01342</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-7040-5108</orcidid><orcidid>https://orcid.org/0000-0002-1894-7679</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1549-9596 |
ispartof | Journal of chemical information and modeling, 2021-04, Vol.61 (4), p.1603-1616 |
issn | 1549-9596 1549-960X |
language | eng |
recordid | cdi_acs_journals_10_1021_acs_jcim_0c01342 |
source | ACS Publications; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /> |
subjects | Acquisitions & mergers Assaying Chemistry Chemistry, Medicinal Chemistry, Multidisciplinary Collaboration Computer Science Computer Science, Information Systems Computer Science, Interdisciplinary Applications Life Sciences & Biomedicine Machine Learning and Deep Learning Modelling Pharmacology & Pharmacy Physical Sciences Science & Technology Technology Trade secrets |
title | Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T00%3A02%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acs_j&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Collaborative%20Profile-QSAR:%20A%20Natural%20Platform%20for%20Building%20Collaborative%20Models%20among%20Competing%20Companies&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Martin,%20Eric%20J&rft.date=2021-04-26&rft.volume=61&rft.issue=4&rft.spage=1603&rft.epage=1616&rft.pages=1603-1616&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.0c01342&rft_dat=%3Cproquest_acs_j%3E2524943150%3C/proquest_acs_j%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2524943150&rft_id=info:pmid/33844519&rfr_iscdi=true |