Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies

Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2021-04, Vol.61 (4), p.1603-1616
Hauptverfasser: Martin, Eric J, Zhu, Xiang-Wei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1616
container_issue 4
container_start_page 1603
container_title Journal of chemical information and modeling
container_volume 61
creator Martin, Eric J
Zhu, Xiang-Wei
description Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unli
doi_str_mv 10.1021/acs.jcim.0c01342
format Article
fullrecord <record><control><sourceid>proquest_acs_j</sourceid><recordid>TN_cdi_acs_journals_10_1021_acs_jcim_0c01342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2524943150</sourcerecordid><originalsourceid>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</originalsourceid><addsrcrecordid>eNqNkc1v1DAQxS0EoqVw54QicUGCLDP-yCbclqi0SAXKl8Qtcpwx8iqJFzsp4r-vt7tbqUhIHGyP5N8bvZnH2FOEBQLH19rExdq4YQEGUEh-jx2jklVeFfDj_qFWVXHEHsW4BhCiKvhDdiREKaXC6pita9_3uvVBT-6Kssvgresp__x19eVNtso-6mkOus8uez1ZH4YsXdnb2fWdG39md7UffEd9zPTgb76GDU1uX-nRUXzMHljdR3qyf0_Y93en3-rz_OLT2ft6dZFrCcWUkwFNyxa6dGSpBSJZq9HqZVmRQgHCFAaLTlfcqk51S0KBLVhLXCklhDhhL3Z9N8H_milOzeCioWR1JD_HhivkgqsCt-jzv9C1n8OY3CWKy0oKVJAo2FEm-BgD2WYT3KDDnwah2ebQpByabQ7NPockebZvPLcDdbeCw-ITUO6A39R6G42j0dAtBgCFlEteFqlCrN2UVuzH2s_jlKQv_1-a6Fc7-sbjYbp_Gr8G-Ge0ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2524943150</pqid></control><display><type>article</type><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><source>ACS Publications</source><source>Web of Science - Science Citation Index Expanded - 2021&lt;img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /&gt;</source><creator>Martin, Eric J ; Zhu, Xiang-Wei</creator><creatorcontrib>Martin, Eric J ; Zhu, Xiang-Wei</creatorcontrib><description>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.0c01342</identifier><identifier>PMID: 33844519</identifier><language>eng</language><publisher>WASHINGTON: American Chemical Society</publisher><subject>Acquisitions &amp; mergers ; Assaying ; Chemistry ; Chemistry, Medicinal ; Chemistry, Multidisciplinary ; Collaboration ; Computer Science ; Computer Science, Information Systems ; Computer Science, Interdisciplinary Applications ; Life Sciences &amp; Biomedicine ; Machine Learning and Deep Learning ; Modelling ; Pharmacology &amp; Pharmacy ; Physical Sciences ; Science &amp; Technology ; Technology ; Trade secrets</subject><ispartof>Journal of chemical information and modeling, 2021-04, Vol.61 (4), p.1603-1616</ispartof><rights>2021 American Chemical Society</rights><rights>Copyright American Chemical Society Apr 26, 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>10</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000644728600011</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</citedby><cites>FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</cites><orcidid>0000-0001-7040-5108 ; 0000-0002-1894-7679</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c01342$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.0c01342$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>315,781,785,2766,27081,27929,27930,39263,56743,56793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33844519$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Martin, Eric J</creatorcontrib><creatorcontrib>Zhu, Xiang-Wei</creatorcontrib><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><title>Journal of chemical information and modeling</title><addtitle>J CHEM INF MODEL</addtitle><addtitle>J. Chem. Inf. Model</addtitle><description>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</description><subject>Acquisitions &amp; mergers</subject><subject>Assaying</subject><subject>Chemistry</subject><subject>Chemistry, Medicinal</subject><subject>Chemistry, Multidisciplinary</subject><subject>Collaboration</subject><subject>Computer Science</subject><subject>Computer Science, Information Systems</subject><subject>Computer Science, Interdisciplinary Applications</subject><subject>Life Sciences &amp; Biomedicine</subject><subject>Machine Learning and Deep Learning</subject><subject>Modelling</subject><subject>Pharmacology &amp; Pharmacy</subject><subject>Physical Sciences</subject><subject>Science &amp; Technology</subject><subject>Technology</subject><subject>Trade secrets</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>HGBXW</sourceid><recordid>eNqNkc1v1DAQxS0EoqVw54QicUGCLDP-yCbclqi0SAXKl8Qtcpwx8iqJFzsp4r-vt7tbqUhIHGyP5N8bvZnH2FOEBQLH19rExdq4YQEGUEh-jx2jklVeFfDj_qFWVXHEHsW4BhCiKvhDdiREKaXC6pita9_3uvVBT-6Kssvgresp__x19eVNtso-6mkOus8uez1ZH4YsXdnb2fWdG39md7UffEd9zPTgb76GDU1uX-nRUXzMHljdR3qyf0_Y93en3-rz_OLT2ft6dZFrCcWUkwFNyxa6dGSpBSJZq9HqZVmRQgHCFAaLTlfcqk51S0KBLVhLXCklhDhhL3Z9N8H_milOzeCioWR1JD_HhivkgqsCt-jzv9C1n8OY3CWKy0oKVJAo2FEm-BgD2WYT3KDDnwah2ebQpByabQ7NPockebZvPLcDdbeCw-ITUO6A39R6G42j0dAtBgCFlEteFqlCrN2UVuzH2s_jlKQv_1-a6Fc7-sbjYbp_Gr8G-Ge0ug</recordid><startdate>20210426</startdate><enddate>20210426</enddate><creator>Martin, Eric J</creator><creator>Zhu, Xiang-Wei</creator><general>American Chemical Society</general><general>Amer Chemical Soc</general><scope>BLEPL</scope><scope>DTL</scope><scope>HGBXW</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-7040-5108</orcidid><orcidid>https://orcid.org/0000-0002-1894-7679</orcidid></search><sort><creationdate>20210426</creationdate><title>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</title><author>Martin, Eric J ; Zhu, Xiang-Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a406t-ec0ae7b0d7b048a311effa1fa789e51303c6c16da92f5d5d7e131b0ffe2555333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acquisitions &amp; mergers</topic><topic>Assaying</topic><topic>Chemistry</topic><topic>Chemistry, Medicinal</topic><topic>Chemistry, Multidisciplinary</topic><topic>Collaboration</topic><topic>Computer Science</topic><topic>Computer Science, Information Systems</topic><topic>Computer Science, Interdisciplinary Applications</topic><topic>Life Sciences &amp; Biomedicine</topic><topic>Machine Learning and Deep Learning</topic><topic>Modelling</topic><topic>Pharmacology &amp; Pharmacy</topic><topic>Physical Sciences</topic><topic>Science &amp; Technology</topic><topic>Technology</topic><topic>Trade secrets</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Martin, Eric J</creatorcontrib><creatorcontrib>Zhu, Xiang-Wei</creatorcontrib><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Martin, Eric J</au><au>Zhu, Xiang-Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies</atitle><jtitle>Journal of chemical information and modeling</jtitle><stitle>J CHEM INF MODEL</stitle><addtitle>J. Chem. Inf. Model</addtitle><date>2021-04-26</date><risdate>2021</risdate><volume>61</volume><issue>4</issue><spage>1603</spage><epage>1616</epage><pages>1603-1616</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Massively multitask bioactivity models that transfer learning between thousands of assays have been shown to work dramatically better than separate models trained on each individual assay. In particular, the applicability domain for a given model can expand from compounds similar to those tested in that specific assay to those tested across the full complement of contributing assays. If many large companies would share their assay data and train models on the superset, predictions should be better than what each company can do alone. However, a company’s compounds, targets, and activities are among their most guarded trade secrets. Strategies have been proposed to share just the individual collaborators’ models, without exposing any of the training data. Profile-QSAR (pQSAR) is a two-level, multitask, stacked model. It uses profiles of level-1 predictions from single-task models for thousands of assays as compound descriptors for level-2 models. This work describes its simple and natural adaptation to safe collaboration by model sharing. Broad model sharing has not yet been implemented across multiple large companies, so there are numerous unanswered questions. Novartis was formed from several mergers and acquisitions. In principle, this should allow an internal simulation of model sharing. In practice, the lack of metadata about the origins of compounds and assays made this difficult. Nevertheless, we have attempted to simulate this process and propose some findings: multitask pQSAR is always an improvement over single-task models; collaborative multitask modeling did not improve predictions on internal compounds; collaboration did improve predictions for external compounds but far less than the purely internal multitask modeling for internal compounds; collaborative models for external compounds increasingly improve as overlap between compound collections increases; combining profiles from inside and outside the company is not best, with internal predictions better using only the inside profile and external using only the outside profile, but a consensus of models using all three profiles is best on external compounds and a good compromise on internal compounds. We anticipate similar results from other model-sharing approaches. Indeed, since collaborative pQSAR through model sharing is mathematically identical to pQSAR using actual shared data, we believe our conclusions should apply to collaborative modeling by any current method even including the unlikely scenario of directly sharing all chemical structures and assay data.</abstract><cop>WASHINGTON</cop><pub>American Chemical Society</pub><pmid>33844519</pmid><doi>10.1021/acs.jcim.0c01342</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-7040-5108</orcidid><orcidid>https://orcid.org/0000-0002-1894-7679</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2021-04, Vol.61 (4), p.1603-1616
issn 1549-9596
1549-960X
language eng
recordid cdi_acs_journals_10_1021_acs_jcim_0c01342
source ACS Publications; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />
subjects Acquisitions & mergers
Assaying
Chemistry
Chemistry, Medicinal
Chemistry, Multidisciplinary
Collaboration
Computer Science
Computer Science, Information Systems
Computer Science, Interdisciplinary Applications
Life Sciences & Biomedicine
Machine Learning and Deep Learning
Modelling
Pharmacology & Pharmacy
Physical Sciences
Science & Technology
Technology
Trade secrets
title Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T00%3A02%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acs_j&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Collaborative%20Profile-QSAR:%20A%20Natural%20Platform%20for%20Building%20Collaborative%20Models%20among%20Competing%20Companies&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Martin,%20Eric%20J&rft.date=2021-04-26&rft.volume=61&rft.issue=4&rft.spage=1603&rft.epage=1616&rft.pages=1603-1616&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.0c01342&rft_dat=%3Cproquest_acs_j%3E2524943150%3C/proquest_acs_j%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2524943150&rft_id=info:pmid/33844519&rfr_iscdi=true