Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets

The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classificati...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Science of the total environment 2022-09, Vol.837, p.155807-155807, Article 155807
Hauptverfasser:	Xu, Nuohan, Zhang, Zhenyan, Shen, Yechao, Zhang, Qi, Liu, Zhen, Yu, Yitian, Wang, Yan, Lei, Chaotang, Ke, Mingjing, Qiu, Danyan, Lu, Tao, Chen, Yiling, Xiong, Juntao, Qian, Haifeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Ecotoxicology Machine learning Metadata analysis Microbiota
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	155807
container_issue
container_start_page	155807
container_title	The Science of the total environment
container_volume	837
creator	Xu, Nuohan Zhang, Zhenyan Shen, Yechao Zhang, Qi Liu, Zhen Yu, Yitian Wang, Yan Lei, Chaotang Ke, Mingjing Qiu, Danyan Lu, Tao Chen, Yiling Xiong, Juntao Qian, Haifeng
description	The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classification models to find a better algorithm for the microbiota data analysis. Here, for the first time, we evaluated the performance, accuracy and running time of the binary classification models built by three machine learning methods - random forest (RF), support vector machine (SVM), logistic regression (LR), and one deep learning method - back propagation neural network (BPNN). The built models were based on the microbiota datasets that removed low-quality variables and solved the class imbalance problem. Additionally, we optimized the models by tuning. Our study demonstrated that dataset pre-processing was a necessary process for model construction. Among these 4 binary classification models, BPNN and RF were the most suitable methods for constructing microbiota binary classification models. Using these 4 models to predict multiple microbial datasets, BPNN showed the highest accuracy and the most robust performance, while the RF method was ranked second. We also constructed the optimal models by adjusting the epochs of BPNN and the n_estimators of RF for six times. The evaluation related to performances of models provided a road map for the application of artificial intelligence to assess microbial ecology. [Display omitted] •Dataset preprocessing was a necessary process for model building.•Deep learning model displayed the highest prediction accuracy among all models.•Deep learning had the most robust performance based on microbial datasets.•The optimal models were built by adjusting the parameters.
doi_str_mv	10.1016/j.scitotenv.2022.155807
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2662546166</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0048969722029047</els_id><sourcerecordid>2662546166</sourcerecordid><originalsourceid>FETCH-LOGICAL-c371t-405737d508485c7a67a0e660e1b831136b37a9c5cc77044cf69a527d9e42551b3</originalsourceid><addsrcrecordid>eNqFkD9PwzAQxS0EoqXwFcAjS4qdxHYyVhX_pEosMFuOc2lcJXFqO5X49rhqYeWWu-Hde3c_hB4oWVJC-dNu6bUJNsBwWKYkTZeUsYKICzSnhSgTSlJ-ieaE5EVS8lLM0I33OxJLFPQazTLGMsFIOUf7te1H5QCHFvAIrrGuV4MGbBvcT10wYwe4MoNy31h3ynvTGK2CsQPubQ2dxyZORjtbGdXh1mzbJLTOTtt2nAL2sJ9g0GbY4loF5SH4W3TVqM7D3bkv0NfL8-f6Ldl8vL6vV5tEZ4KGJCdMZKJmpMgLpoXiQhHgnACtiozSjFeZUKVmWgtB8lw3vFQsFXUJecoYrbIFejz5js7GI3yQvfEauk4NYCcvU85TlnPKeZSKkzS-4b2DRo7O9PFlSYk88pY7-cdbHnnLE--4eX8Omaoe6r-9X8BRsDoJIio4GHBHo0gEauNAB1lb82_ID44Il64</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2662546166</pqid></control><display><type>article</type><title>Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets</title><source>Access via ScienceDirect (Elsevier)</source><creator>Xu, Nuohan ; Zhang, Zhenyan ; Shen, Yechao ; Zhang, Qi ; Liu, Zhen ; Yu, Yitian ; Wang, Yan ; Lei, Chaotang ; Ke, Mingjing ; Qiu, Danyan ; Lu, Tao ; Chen, Yiling ; Xiong, Juntao ; Qian, Haifeng</creator><creatorcontrib>Xu, Nuohan ; Zhang, Zhenyan ; Shen, Yechao ; Zhang, Qi ; Liu, Zhen ; Yu, Yitian ; Wang, Yan ; Lei, Chaotang ; Ke, Mingjing ; Qiu, Danyan ; Lu, Tao ; Chen, Yiling ; Xiong, Juntao ; Qian, Haifeng</creatorcontrib><description>The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classification models to find a better algorithm for the microbiota data analysis. Here, for the first time, we evaluated the performance, accuracy and running time of the binary classification models built by three machine learning methods - random forest (RF), support vector machine (SVM), logistic regression (LR), and one deep learning method - back propagation neural network (BPNN). The built models were based on the microbiota datasets that removed low-quality variables and solved the class imbalance problem. Additionally, we optimized the models by tuning. Our study demonstrated that dataset pre-processing was a necessary process for model construction. Among these 4 binary classification models, BPNN and RF were the most suitable methods for constructing microbiota binary classification models. Using these 4 models to predict multiple microbial datasets, BPNN showed the highest accuracy and the most robust performance, while the RF method was ranked second. We also constructed the optimal models by adjusting the epochs of BPNN and the n_estimators of RF for six times. The evaluation related to performances of models provided a road map for the application of artificial intelligence to assess microbial ecology. [Display omitted] •Dataset preprocessing was a necessary process for model building.•Deep learning model displayed the highest prediction accuracy among all models.•Deep learning had the most robust performance based on microbial datasets.•The optimal models were built by adjusting the parameters.</description><identifier>ISSN: 0048-9697</identifier><identifier>EISSN: 1879-1026</identifier><identifier>DOI: 10.1016/j.scitotenv.2022.155807</identifier><identifier>PMID: 35537509</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>Deep learning ; Ecotoxicology ; Machine learning ; Metadata analysis ; Microbiota</subject><ispartof>The Science of the total environment, 2022-09, Vol.837, p.155807-155807, Article 155807</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright © 2022 Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c371t-405737d508485c7a67a0e660e1b831136b37a9c5cc77044cf69a527d9e42551b3</citedby><cites>FETCH-LOGICAL-c371t-405737d508485c7a67a0e660e1b831136b37a9c5cc77044cf69a527d9e42551b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.scitotenv.2022.155807$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35537509$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Nuohan</creatorcontrib><creatorcontrib>Zhang, Zhenyan</creatorcontrib><creatorcontrib>Shen, Yechao</creatorcontrib><creatorcontrib>Zhang, Qi</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Yu, Yitian</creatorcontrib><creatorcontrib>Wang, Yan</creatorcontrib><creatorcontrib>Lei, Chaotang</creatorcontrib><creatorcontrib>Ke, Mingjing</creatorcontrib><creatorcontrib>Qiu, Danyan</creatorcontrib><creatorcontrib>Lu, Tao</creatorcontrib><creatorcontrib>Chen, Yiling</creatorcontrib><creatorcontrib>Xiong, Juntao</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><title>Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets</title><title>The Science of the total environment</title><addtitle>Sci Total Environ</addtitle><description>The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classification models to find a better algorithm for the microbiota data analysis. Here, for the first time, we evaluated the performance, accuracy and running time of the binary classification models built by three machine learning methods - random forest (RF), support vector machine (SVM), logistic regression (LR), and one deep learning method - back propagation neural network (BPNN). The built models were based on the microbiota datasets that removed low-quality variables and solved the class imbalance problem. Additionally, we optimized the models by tuning. Our study demonstrated that dataset pre-processing was a necessary process for model construction. Among these 4 binary classification models, BPNN and RF were the most suitable methods for constructing microbiota binary classification models. Using these 4 models to predict multiple microbial datasets, BPNN showed the highest accuracy and the most robust performance, while the RF method was ranked second. We also constructed the optimal models by adjusting the epochs of BPNN and the n_estimators of RF for six times. The evaluation related to performances of models provided a road map for the application of artificial intelligence to assess microbial ecology. [Display omitted] •Dataset preprocessing was a necessary process for model building.•Deep learning model displayed the highest prediction accuracy among all models.•Deep learning had the most robust performance based on microbial datasets.•The optimal models were built by adjusting the parameters.</description><subject>Deep learning</subject><subject>Ecotoxicology</subject><subject>Machine learning</subject><subject>Metadata analysis</subject><subject>Microbiota</subject><issn>0048-9697</issn><issn>1879-1026</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqFkD9PwzAQxS0EoqXwFcAjS4qdxHYyVhX_pEosMFuOc2lcJXFqO5X49rhqYeWWu-Hde3c_hB4oWVJC-dNu6bUJNsBwWKYkTZeUsYKICzSnhSgTSlJ-ieaE5EVS8lLM0I33OxJLFPQazTLGMsFIOUf7te1H5QCHFvAIrrGuV4MGbBvcT10wYwe4MoNy31h3ynvTGK2CsQPubQ2dxyZORjtbGdXh1mzbJLTOTtt2nAL2sJ9g0GbY4loF5SH4W3TVqM7D3bkv0NfL8-f6Ldl8vL6vV5tEZ4KGJCdMZKJmpMgLpoXiQhHgnACtiozSjFeZUKVmWgtB8lw3vFQsFXUJecoYrbIFejz5js7GI3yQvfEauk4NYCcvU85TlnPKeZSKkzS-4b2DRo7O9PFlSYk88pY7-cdbHnnLE--4eX8Omaoe6r-9X8BRsDoJIio4GHBHo0gEauNAB1lb82_ID44Il64</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Xu, Nuohan</creator><creator>Zhang, Zhenyan</creator><creator>Shen, Yechao</creator><creator>Zhang, Qi</creator><creator>Liu, Zhen</creator><creator>Yu, Yitian</creator><creator>Wang, Yan</creator><creator>Lei, Chaotang</creator><creator>Ke, Mingjing</creator><creator>Qiu, Danyan</creator><creator>Lu, Tao</creator><creator>Chen, Yiling</creator><creator>Xiong, Juntao</creator><creator>Qian, Haifeng</creator><general>Elsevier B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20220901</creationdate><title>Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets</title><author>Xu, Nuohan ; Zhang, Zhenyan ; Shen, Yechao ; Zhang, Qi ; Liu, Zhen ; Yu, Yitian ; Wang, Yan ; Lei, Chaotang ; Ke, Mingjing ; Qiu, Danyan ; Lu, Tao ; Chen, Yiling ; Xiong, Juntao ; Qian, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c371t-405737d508485c7a67a0e660e1b831136b37a9c5cc77044cf69a527d9e42551b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Deep learning</topic><topic>Ecotoxicology</topic><topic>Machine learning</topic><topic>Metadata analysis</topic><topic>Microbiota</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Nuohan</creatorcontrib><creatorcontrib>Zhang, Zhenyan</creatorcontrib><creatorcontrib>Shen, Yechao</creatorcontrib><creatorcontrib>Zhang, Qi</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Yu, Yitian</creatorcontrib><creatorcontrib>Wang, Yan</creatorcontrib><creatorcontrib>Lei, Chaotang</creatorcontrib><creatorcontrib>Ke, Mingjing</creatorcontrib><creatorcontrib>Qiu, Danyan</creatorcontrib><creatorcontrib>Lu, Tao</creatorcontrib><creatorcontrib>Chen, Yiling</creatorcontrib><creatorcontrib>Xiong, Juntao</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>The Science of the total environment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Nuohan</au><au>Zhang, Zhenyan</au><au>Shen, Yechao</au><au>Zhang, Qi</au><au>Liu, Zhen</au><au>Yu, Yitian</au><au>Wang, Yan</au><au>Lei, Chaotang</au><au>Ke, Mingjing</au><au>Qiu, Danyan</au><au>Lu, Tao</au><au>Chen, Yiling</au><au>Xiong, Juntao</au><au>Qian, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets</atitle><jtitle>The Science of the total environment</jtitle><addtitle>Sci Total Environ</addtitle><date>2022-09-01</date><risdate>2022</risdate><volume>837</volume><spage>155807</spage><epage>155807</epage><pages>155807-155807</pages><artnum>155807</artnum><issn>0048-9697</issn><eissn>1879-1026</eissn><abstract>The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classification models to find a better algorithm for the microbiota data analysis. Here, for the first time, we evaluated the performance, accuracy and running time of the binary classification models built by three machine learning methods - random forest (RF), support vector machine (SVM), logistic regression (LR), and one deep learning method - back propagation neural network (BPNN). The built models were based on the microbiota datasets that removed low-quality variables and solved the class imbalance problem. Additionally, we optimized the models by tuning. Our study demonstrated that dataset pre-processing was a necessary process for model construction. Among these 4 binary classification models, BPNN and RF were the most suitable methods for constructing microbiota binary classification models. Using these 4 models to predict multiple microbial datasets, BPNN showed the highest accuracy and the most robust performance, while the RF method was ranked second. We also constructed the optimal models by adjusting the epochs of BPNN and the n_estimators of RF for six times. The evaluation related to performances of models provided a road map for the application of artificial intelligence to assess microbial ecology. [Display omitted] •Dataset preprocessing was a necessary process for model building.•Deep learning model displayed the highest prediction accuracy among all models.•Deep learning had the most robust performance based on microbial datasets.•The optimal models were built by adjusting the parameters.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><pmid>35537509</pmid><doi>10.1016/j.scitotenv.2022.155807</doi><tpages>1</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0048-9697
ispartof	The Science of the total environment, 2022-09, Vol.837, p.155807-155807, Article 155807
issn	0048-9697 1879-1026
language	eng
recordid	cdi_proquest_miscellaneous_2662546166
source	Access via ScienceDirect (Elsevier)
subjects	Deep learning Ecotoxicology Machine learning Metadata analysis Microbiota
title	Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T21%3A42%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Compare%20the%20performance%20of%20multiple%20binary%20classification%20models%20in%20microbial%20high-throughput%20sequencing%20datasets&rft.jtitle=The%20Science%20of%20the%20total%20environment&rft.au=Xu,%20Nuohan&rft.date=2022-09-01&rft.volume=837&rft.spage=155807&rft.epage=155807&rft.pages=155807-155807&rft.artnum=155807&rft.issn=0048-9697&rft.eissn=1879-1026&rft_id=info:doi/10.1016/j.scitotenv.2022.155807&rft_dat=%3Cproquest_cross%3E2662546166%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2662546166&rft_id=info:pmid/35537509&rft_els_id=S0048969722029047&rfr_iscdi=true