Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity

Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2021-12, Vol.61 (12), p.5793-5803
Hauptverfasser: Feinstein, Jeremy, Sivaraman, Ganesh, Picel, Kurt, Peters, Brian, Vázquez-Mayagoitia, Álvaro, Ramanathan, Arvind, MacDonell, Margaret, Foster, Ian, Yan, Eugene
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5803
container_issue 12
container_start_page 5793
container_title Journal of chemical information and modeling
container_volume 61
creator Feinstein, Jeremy
Sivaraman, Ganesh
Picel, Kurt
Peters, Brian
Vázquez-Mayagoitia, Álvaro
Ramanathan, Arvind
MacDonell, Margaret
Foster, Ian
Yan, Eugene
description Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.
doi_str_mv 10.1021/acs.jcim.1c01204
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1835619</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2610414325</sourcerecordid><originalsourceid>FETCH-LOGICAL-a433t-d540e1689405bd47fabc511af1e2382a94bb89f481802dcbca0f9be1797e9dee3</originalsourceid><addsrcrecordid>eNp1kcFr2zAUh8VYWbtu952G2S471JmeJCvWsXRbWwis0BR2E7L8tDlzpFSyof7vqzRJKYOdJMT3-z2ePkI-AJ0BZfDV2DRb2W49A0uBUfGKnEAlVKkk_fX6cK-UPCZvU1pRyrmS7A055kLRiov6hLR33mIcTOeHqbz2LsQ1tsU3xE2xjMYnh7FYoIm-87-L4IobjK4fQwym_zv1hfFtcRP66eXb7dikweTaYhkeOtsN0zty5Eyf8P3-PCV3P74vL67Kxc_L64vzRWkE50PZVoIiyFoJWjWtmDvT2ArAOEDGa2aUaJpaOVFDTVlrG2uoUw3CXM1RtYj8lHza9YY0dDrl0Wj_2OA92kFDzSsJKkNfdtAmhvsR06DXXbLY98ZjGJNmEqgAwVmV0c__oKswRp9X2FK1YEJKmSm6o2wMKUV0ehO7tYmTBqq3mnTWpLea9F5TjnzcF49N_u_nwMFLBs52wFP0MPS_fY_VSZ9E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2618424666</pqid></control><display><type>article</type><title>Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity</title><source>MEDLINE</source><source>ACS Publications</source><creator>Feinstein, Jeremy ; Sivaraman, Ganesh ; Picel, Kurt ; Peters, Brian ; Vázquez-Mayagoitia, Álvaro ; Ramanathan, Arvind ; MacDonell, Margaret ; Foster, Ian ; Yan, Eugene</creator><creatorcontrib>Feinstein, Jeremy ; Sivaraman, Ganesh ; Picel, Kurt ; Peters, Brian ; Vázquez-Mayagoitia, Álvaro ; Ramanathan, Arvind ; MacDonell, Margaret ; Foster, Ian ; Yan, Eugene ; Argonne National Lab. (ANL), Argonne, IL (United States)</creatorcontrib><description>Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.1c01204</identifier><identifier>PMID: 34905348</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Animals ; Artificial neural networks ; Bioaccumulation ; Biocompatibility ; Chemical bonds ; Environmental protection ; Fluorocarbons - chemistry ; Fluorocarbons - toxicity ; Gaussian process ; In vivo methods and tests ; Industrial applications ; Knowledge management ; Layers ; Machine Learning ; Machine Learning and Deep Learning ; Molecules ; Neural networks ; Neural Networks, Computer ; Organic compounds ; Perfluoroalkyl &amp; polyfluoroalkyl substances ; Rats ; Rodent models ; Toxicity ; Uncertainty</subject><ispartof>Journal of chemical information and modeling, 2021-12, Vol.61 (12), p.5793-5803</ispartof><rights>2021 UChicago Argonne, LLC. Published by American Chemical Society</rights><rights>Copyright American Chemical Society Dec 27, 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a433t-d540e1689405bd47fabc511af1e2382a94bb89f481802dcbca0f9be1797e9dee3</citedby><cites>FETCH-LOGICAL-a433t-d540e1689405bd47fabc511af1e2382a94bb89f481802dcbca0f9be1797e9dee3</cites><orcidid>0000-0002-1415-6300 ; 0000-0001-9056-9855 ; 0000-0002-7112-7397 ; 0000000190569855 ; 0000000214156300 ; 0000000271127397</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.1c01204$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.1c01204$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>230,314,780,784,885,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34905348$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1835619$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Feinstein, Jeremy</creatorcontrib><creatorcontrib>Sivaraman, Ganesh</creatorcontrib><creatorcontrib>Picel, Kurt</creatorcontrib><creatorcontrib>Peters, Brian</creatorcontrib><creatorcontrib>Vázquez-Mayagoitia, Álvaro</creatorcontrib><creatorcontrib>Ramanathan, Arvind</creatorcontrib><creatorcontrib>MacDonell, Margaret</creatorcontrib><creatorcontrib>Foster, Ian</creatorcontrib><creatorcontrib>Yan, Eugene</creatorcontrib><creatorcontrib>Argonne National Lab. (ANL), Argonne, IL (United States)</creatorcontrib><title>Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.</description><subject>Animals</subject><subject>Artificial neural networks</subject><subject>Bioaccumulation</subject><subject>Biocompatibility</subject><subject>Chemical bonds</subject><subject>Environmental protection</subject><subject>Fluorocarbons - chemistry</subject><subject>Fluorocarbons - toxicity</subject><subject>Gaussian process</subject><subject>In vivo methods and tests</subject><subject>Industrial applications</subject><subject>Knowledge management</subject><subject>Layers</subject><subject>Machine Learning</subject><subject>Machine Learning and Deep Learning</subject><subject>Molecules</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Organic compounds</subject><subject>Perfluoroalkyl &amp; polyfluoroalkyl substances</subject><subject>Rats</subject><subject>Rodent models</subject><subject>Toxicity</subject><subject>Uncertainty</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kcFr2zAUh8VYWbtu952G2S471JmeJCvWsXRbWwis0BR2E7L8tDlzpFSyof7vqzRJKYOdJMT3-z2ePkI-AJ0BZfDV2DRb2W49A0uBUfGKnEAlVKkk_fX6cK-UPCZvU1pRyrmS7A055kLRiov6hLR33mIcTOeHqbz2LsQ1tsU3xE2xjMYnh7FYoIm-87-L4IobjK4fQwym_zv1hfFtcRP66eXb7dikweTaYhkeOtsN0zty5Eyf8P3-PCV3P74vL67Kxc_L64vzRWkE50PZVoIiyFoJWjWtmDvT2ArAOEDGa2aUaJpaOVFDTVlrG2uoUw3CXM1RtYj8lHza9YY0dDrl0Wj_2OA92kFDzSsJKkNfdtAmhvsR06DXXbLY98ZjGJNmEqgAwVmV0c__oKswRp9X2FK1YEJKmSm6o2wMKUV0ehO7tYmTBqq3mnTWpLea9F5TjnzcF49N_u_nwMFLBs52wFP0MPS_fY_VSZ9E</recordid><startdate>20211227</startdate><enddate>20211227</enddate><creator>Feinstein, Jeremy</creator><creator>Sivaraman, Ganesh</creator><creator>Picel, Kurt</creator><creator>Peters, Brian</creator><creator>Vázquez-Mayagoitia, Álvaro</creator><creator>Ramanathan, Arvind</creator><creator>MacDonell, Margaret</creator><creator>Foster, Ian</creator><creator>Yan, Eugene</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-1415-6300</orcidid><orcidid>https://orcid.org/0000-0001-9056-9855</orcidid><orcidid>https://orcid.org/0000-0002-7112-7397</orcidid><orcidid>https://orcid.org/0000000190569855</orcidid><orcidid>https://orcid.org/0000000214156300</orcidid><orcidid>https://orcid.org/0000000271127397</orcidid></search><sort><creationdate>20211227</creationdate><title>Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity</title><author>Feinstein, Jeremy ; Sivaraman, Ganesh ; Picel, Kurt ; Peters, Brian ; Vázquez-Mayagoitia, Álvaro ; Ramanathan, Arvind ; MacDonell, Margaret ; Foster, Ian ; Yan, Eugene</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a433t-d540e1689405bd47fabc511af1e2382a94bb89f481802dcbca0f9be1797e9dee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Animals</topic><topic>Artificial neural networks</topic><topic>Bioaccumulation</topic><topic>Biocompatibility</topic><topic>Chemical bonds</topic><topic>Environmental protection</topic><topic>Fluorocarbons - chemistry</topic><topic>Fluorocarbons - toxicity</topic><topic>Gaussian process</topic><topic>In vivo methods and tests</topic><topic>Industrial applications</topic><topic>Knowledge management</topic><topic>Layers</topic><topic>Machine Learning</topic><topic>Machine Learning and Deep Learning</topic><topic>Molecules</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Organic compounds</topic><topic>Perfluoroalkyl &amp; polyfluoroalkyl substances</topic><topic>Rats</topic><topic>Rodent models</topic><topic>Toxicity</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Feinstein, Jeremy</creatorcontrib><creatorcontrib>Sivaraman, Ganesh</creatorcontrib><creatorcontrib>Picel, Kurt</creatorcontrib><creatorcontrib>Peters, Brian</creatorcontrib><creatorcontrib>Vázquez-Mayagoitia, Álvaro</creatorcontrib><creatorcontrib>Ramanathan, Arvind</creatorcontrib><creatorcontrib>MacDonell, Margaret</creatorcontrib><creatorcontrib>Foster, Ian</creatorcontrib><creatorcontrib>Yan, Eugene</creatorcontrib><creatorcontrib>Argonne National Lab. (ANL), Argonne, IL (United States)</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Feinstein, Jeremy</au><au>Sivaraman, Ganesh</au><au>Picel, Kurt</au><au>Peters, Brian</au><au>Vázquez-Mayagoitia, Álvaro</au><au>Ramanathan, Arvind</au><au>MacDonell, Margaret</au><au>Foster, Ian</au><au>Yan, Eugene</au><aucorp>Argonne National Lab. (ANL), Argonne, IL (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2021-12-27</date><risdate>2021</risdate><volume>61</volume><issue>12</issue><spage>5793</spage><epage>5803</epage><pages>5793-5803</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because of their widespread industrial uses, environmental persistence, and bioaccumulation. A growing, increasingly diverse inventory of PFAS, including 8163 chemicals, has recently been updated by the U.S. Environmental Protection Agency. However, with the exception of a handful of well-studied examples, little is known about their human toxicity potential because of the substantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methods, including random forests, deep neural networks (DNN), graph convolutional networks, and Gaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAS compounds. To address the scarcity of toxicity information for PFAS, publicly available datasets of oral rat LD50 for all organic compounds are aggregated and used to develop state-of-the-art ML source models for transfer learning. A total of 519 fluorinated compounds containing two or more C-F bonds with known toxicity are used for knowledge transfer to ensembles of the best-performing source model, DNN, to generate the target models for the PFAS domain with access to uncertainty. This study predicts toxicity for PFAS with a defined chemical structure. To further inform prediction confidence, the transfer-learned model is embedded within a SelectiveNet architecture, where the model is allowed to identify regions of prediction with greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>34905348</pmid><doi>10.1021/acs.jcim.1c01204</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-1415-6300</orcidid><orcidid>https://orcid.org/0000-0001-9056-9855</orcidid><orcidid>https://orcid.org/0000-0002-7112-7397</orcidid><orcidid>https://orcid.org/0000000190569855</orcidid><orcidid>https://orcid.org/0000000214156300</orcidid><orcidid>https://orcid.org/0000000271127397</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2021-12, Vol.61 (12), p.5793-5803
issn 1549-9596
1549-960X
language eng
recordid cdi_osti_scitechconnect_1835619
source MEDLINE; ACS Publications
subjects Animals
Artificial neural networks
Bioaccumulation
Biocompatibility
Chemical bonds
Environmental protection
Fluorocarbons - chemistry
Fluorocarbons - toxicity
Gaussian process
In vivo methods and tests
Industrial applications
Knowledge management
Layers
Machine Learning
Machine Learning and Deep Learning
Molecules
Neural networks
Neural Networks, Computer
Organic compounds
Perfluoroalkyl & polyfluoroalkyl substances
Rats
Rodent models
Toxicity
Uncertainty
title Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T05%3A55%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Uncertainty-Informed%20Deep%20Transfer%20Learning%20of%20Perfluoroalkyl%20and%20Polyfluoroalkyl%20Substance%20Toxicity&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Feinstein,%20Jeremy&rft.aucorp=Argonne%20National%20Lab.%20(ANL),%20Argonne,%20IL%20(United%20States)&rft.date=2021-12-27&rft.volume=61&rft.issue=12&rft.spage=5793&rft.epage=5803&rft.pages=5793-5803&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.1c01204&rft_dat=%3Cproquest_osti_%3E2610414325%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2618424666&rft_id=info:pmid/34905348&rfr_iscdi=true