Multi-label classification and label dependence in in silico toxicity prediction

Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, name...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Toxicology in vitro 2021-08, Vol.74, p.105157-105157, Article 105157
Hauptverfasser:	Yap, Xiu Huan, Raymer, Michael
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Classifiers Computer applications Label dependence Learning Model accuracy Multi-label classification Partitioning Prediction models Predictions Regression analysis Stacking Statistical analysis Tox21 Toxicity Toxicity prediction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	105157
container_issue
container_start_page	105157
container_title	Toxicology in vitro
container_volume	74
creator	Yap, Xiu Huan Raymer, Michael
description	Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p
doi_str_mv	10.1016/j.tiv.2021.105157
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2511896450</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0887233321000825</els_id><sourcerecordid>2511896450</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-4249a5f033fb389fcf792cd182f801cc03f0b7c1f454b16fe6a4ca708c2e2e0c3</originalsourceid><addsrcrecordid>eNp9kE1r3DAQhkVJaTZpf0AvxZBLL97qw7JkciqhTQIJySE9C3k8glm89layQ_LvI-M0hx4CA2LQ874MD2NfBd8KLuofu-1Ej1vJpci7Ftp8YBthTVMqYcwR23BrTSmVUsfsJKUd51xbyT-xY6WsaqSqNuz-du4nKnvfYl9A71OiQOAnGofCD12xfnR4wKHDAbCgYZlEPcFYTOMTAU3PxSFiR7CkPrOPwfcJv7y-p-zP718PF1flzd3l9cXPmxKUFVNZyarxOnClQqtsEyCYRkInrAyWCwCuAm8NiFDpqhV1wNpX4A23IFEiB3XKvq-9hzj-nTFNbk8JsO_9gOOcnNRC2KauNM_o2X_obpzjkK_LlKq1Mro2mRIrBXFMKWJwh0h7H5-d4G7R7XYu63aLbrfqzplvr81zu8fuLfHPbwbOVwCzikfC6BLQ4rGjiDC5bqR36l8A7h6PtA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2536537567</pqid></control><display><type>article</type><title>Multi-label classification and label dependence in in silico toxicity prediction</title><source>Elsevier ScienceDirect Journals</source><creator>Yap, Xiu Huan ; Raymer, Michael</creator><creatorcontrib>Yap, Xiu Huan ; Raymer, Michael</creatorcontrib><description>Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings. •Toxicity endpoints show high degree of marginal label dependency.•Multi-label classification models utilize label dependencies for better performance.•An original dependency score is designed for data with low prior probabilities.•Data-driven, learned label partitioning is an alternative to random partitioning.•In most Tox21 labels, Stacking outperforms the base Random Forest classifier.</description><identifier>ISSN: 0887-2333</identifier><identifier>EISSN: 1879-3177</identifier><identifier>DOI: 10.1016/j.tiv.2021.105157</identifier><identifier>PMID: 33839234</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Classification ; Classifiers ; Computer applications ; Label dependence ; Learning ; Model accuracy ; Multi-label classification ; Partitioning ; Prediction models ; Predictions ; Regression analysis ; Stacking ; Statistical analysis ; Tox21 ; Toxicity ; Toxicity prediction</subject><ispartof>Toxicology in vitro, 2021-08, Vol.74, p.105157-105157, Article 105157</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright © 2021 Elsevier Ltd. All rights reserved.</rights><rights>Copyright Elsevier Science Ltd. Aug 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-4249a5f033fb389fcf792cd182f801cc03f0b7c1f454b16fe6a4ca708c2e2e0c3</citedby><cites>FETCH-LOGICAL-c381t-4249a5f033fb389fcf792cd182f801cc03f0b7c1f454b16fe6a4ca708c2e2e0c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0887233321000825$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33839234$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yap, Xiu Huan</creatorcontrib><creatorcontrib>Raymer, Michael</creatorcontrib><title>Multi-label classification and label dependence in in silico toxicity prediction</title><title>Toxicology in vitro</title><addtitle>Toxicol In Vitro</addtitle><description>Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings. •Toxicity endpoints show high degree of marginal label dependency.•Multi-label classification models utilize label dependencies for better performance.•An original dependency score is designed for data with low prior probabilities.•Data-driven, learned label partitioning is an alternative to random partitioning.•In most Tox21 labels, Stacking outperforms the base Random Forest classifier.</description><subject>Classification</subject><subject>Classifiers</subject><subject>Computer applications</subject><subject>Label dependence</subject><subject>Learning</subject><subject>Model accuracy</subject><subject>Multi-label classification</subject><subject>Partitioning</subject><subject>Prediction models</subject><subject>Predictions</subject><subject>Regression analysis</subject><subject>Stacking</subject><subject>Statistical analysis</subject><subject>Tox21</subject><subject>Toxicity</subject><subject>Toxicity prediction</subject><issn>0887-2333</issn><issn>1879-3177</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1r3DAQhkVJaTZpf0AvxZBLL97qw7JkciqhTQIJySE9C3k8glm89layQ_LvI-M0hx4CA2LQ874MD2NfBd8KLuofu-1Ej1vJpci7Ftp8YBthTVMqYcwR23BrTSmVUsfsJKUd51xbyT-xY6WsaqSqNuz-du4nKnvfYl9A71OiQOAnGofCD12xfnR4wKHDAbCgYZlEPcFYTOMTAU3PxSFiR7CkPrOPwfcJv7y-p-zP718PF1flzd3l9cXPmxKUFVNZyarxOnClQqtsEyCYRkInrAyWCwCuAm8NiFDpqhV1wNpX4A23IFEiB3XKvq-9hzj-nTFNbk8JsO_9gOOcnNRC2KauNM_o2X_obpzjkK_LlKq1Mro2mRIrBXFMKWJwh0h7H5-d4G7R7XYu63aLbrfqzplvr81zu8fuLfHPbwbOVwCzikfC6BLQ4rGjiDC5bqR36l8A7h6PtA</recordid><startdate>202108</startdate><enddate>202108</enddate><creator>Yap, Xiu Huan</creator><creator>Raymer, Michael</creator><general>Elsevier Ltd</general><general>Elsevier Science Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TK</scope><scope>7U7</scope><scope>C1K</scope><scope>7X8</scope></search><sort><creationdate>202108</creationdate><title>Multi-label classification and label dependence in in silico toxicity prediction</title><author>Yap, Xiu Huan ; Raymer, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-4249a5f033fb389fcf792cd182f801cc03f0b7c1f454b16fe6a4ca708c2e2e0c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>Computer applications</topic><topic>Label dependence</topic><topic>Learning</topic><topic>Model accuracy</topic><topic>Multi-label classification</topic><topic>Partitioning</topic><topic>Prediction models</topic><topic>Predictions</topic><topic>Regression analysis</topic><topic>Stacking</topic><topic>Statistical analysis</topic><topic>Tox21</topic><topic>Toxicity</topic><topic>Toxicity prediction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yap, Xiu Huan</creatorcontrib><creatorcontrib>Raymer, Michael</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Environmental Sciences and Pollution Management</collection><collection>MEDLINE - Academic</collection><jtitle>Toxicology in vitro</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yap, Xiu Huan</au><au>Raymer, Michael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-label classification and label dependence in in silico toxicity prediction</atitle><jtitle>Toxicology in vitro</jtitle><addtitle>Toxicol In Vitro</addtitle><date>2021-08</date><risdate>2021</risdate><volume>74</volume><spage>105157</spage><epage>105157</epage><pages>105157-105157</pages><artnum>105157</artnum><issn>0887-2333</issn><eissn>1879-3177</eissn><abstract>Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings. •Toxicity endpoints show high degree of marginal label dependency.•Multi-label classification models utilize label dependencies for better performance.•An original dependency score is designed for data with low prior probabilities.•Data-driven, learned label partitioning is an alternative to random partitioning.•In most Tox21 labels, Stacking outperforms the base Random Forest classifier.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>33839234</pmid><doi>10.1016/j.tiv.2021.105157</doi><tpages>1</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0887-2333
ispartof	Toxicology in vitro, 2021-08, Vol.74, p.105157-105157, Article 105157
issn	0887-2333 1879-3177
language	eng
recordid	cdi_proquest_miscellaneous_2511896450
source	Elsevier ScienceDirect Journals
subjects	Classification Classifiers Computer applications Label dependence Learning Model accuracy Multi-label classification Partitioning Prediction models Predictions Regression analysis Stacking Statistical analysis Tox21 Toxicity Toxicity prediction
title	Multi-label classification and label dependence in in silico toxicity prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T12%3A33%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-label%20classification%20and%20label%20dependence%20in%20in%20silico%20toxicity%20prediction&rft.jtitle=Toxicology%20in%20vitro&rft.au=Yap,%20Xiu%20Huan&rft.date=2021-08&rft.volume=74&rft.spage=105157&rft.epage=105157&rft.pages=105157-105157&rft.artnum=105157&rft.issn=0887-2333&rft.eissn=1879-3177&rft_id=info:doi/10.1016/j.tiv.2021.105157&rft_dat=%3Cproquest_cross%3E2511896450%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2536537567&rft_id=info:pmid/33839234&rft_els_id=S0887233321000825&rfr_iscdi=true