Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method

•The algal alert system is used worldwide for proactive management.•Alert levels of algal bloom were classified using two machine learning models.•Imbalanced class data induced biased training of the machine learning models.•Synthetic data by ADASYN increased the performance of the machine learning...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Water research (Oxford) 2021-12, Vol.207, p.117821-117821, Article 117821
Hauptverfasser: Kim, Jin Hwi, Shin, Jae-Ki, Lee, Hankyu, Lee, Dong Hoon, Kang, Joo-Hyon, Cho, Kyung Hwa, Lee, Yong-Gu, Chon, Kangmin, Baek, Sang-Soo, Park, Yongeun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 117821
container_issue
container_start_page 117821
container_title Water research (Oxford)
container_volume 207
creator Kim, Jin Hwi
Shin, Jae-Ki
Lee, Hankyu
Lee, Dong Hoon
Kang, Joo-Hyon
Cho, Kyung Hwa
Lee, Yong-Gu
Chon, Kangmin
Baek, Sang-Soo
Park, Yongeun
description •The algal alert system is used worldwide for proactive management.•Alert levels of algal bloom were classified using two machine learning models.•Imbalanced class data induced biased training of the machine learning models.•Synthetic data by ADASYN increased the performance of the machine learning models. Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years’ worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the
doi_str_mv 10.1016/j.watres.2021.117821
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2598075923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0043135421010150</els_id><sourcerecordid>2598075923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-d4bd14f5122e42d0c967288426e42d38619bf91763b35de657cb4f85b8510aae3</originalsourceid><addsrcrecordid>eNp9kc9u1DAQxi0EotvCGyDkI5cs_pfEuSChqkClSlzgbDn2pOuVHQc72WpfgafGIUuPnEae-c188vch9I6SPSW0-XjcP-k5Qd4zwuie0lYy-gLtqGy7igkhX6IdIYJXlNfiCl3nfCSEMMa71-iKi1ZSKsUO_b4PU4onNz7i-QB4gjTEFPRoAMcBB20ObgTsQadxZUK04DMuDC4tf8ZPl0GBDzqFYfFY-0ftce9jDBkveZ3qEWurp9mdAOfzWJRmZ3DWYfJ_r8J8iPYNejVon-Htpd6gn1_uftx-qx6-f72__fxQGUH5XFnRWyqGmjIGglliuqZlUgrWrE8uG9r1Q0fbhve8ttDUrenFIOte1pRoDfwGfdjulo__WiDPKrhswHs9QlyyYnUnSVt3jBdUbKhJMecEg5qSCzqdFSVqTUEd1ZaCWlNQWwpl7f1FYekD2Oelf7YX4NMGFDPh5CCpbBwU061LYGZlo_u_wh8Ug5z6</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2598075923</pqid></control><display><type>article</type><title>Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Kim, Jin Hwi ; Shin, Jae-Ki ; Lee, Hankyu ; Lee, Dong Hoon ; Kang, Joo-Hyon ; Cho, Kyung Hwa ; Lee, Yong-Gu ; Chon, Kangmin ; Baek, Sang-Soo ; Park, Yongeun</creator><creatorcontrib>Kim, Jin Hwi ; Shin, Jae-Ki ; Lee, Hankyu ; Lee, Dong Hoon ; Kang, Joo-Hyon ; Cho, Kyung Hwa ; Lee, Yong-Gu ; Chon, Kangmin ; Baek, Sang-Soo ; Park, Yongeun</creatorcontrib><description>•The algal alert system is used worldwide for proactive management.•Alert levels of algal bloom were classified using two machine learning models.•Imbalanced class data induced biased training of the machine learning models.•Synthetic data by ADASYN increased the performance of the machine learning models. Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years’ worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir. [Display omitted]</description><identifier>ISSN: 0043-1354</identifier><identifier>EISSN: 1879-2448</identifier><identifier>DOI: 10.1016/j.watres.2021.117821</identifier><identifier>PMID: 34781184</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>ADASYN ; Alert level ; Early warning ; Environmental Monitoring ; Harmful Algal Bloom ; Harmful algal blooms ; Machine Learning ; Neural Networks, Computer ; Water Quality</subject><ispartof>Water research (Oxford), 2021-12, Vol.207, p.117821-117821, Article 117821</ispartof><rights>2021 Elsevier Ltd</rights><rights>Copyright © 2021 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-d4bd14f5122e42d0c967288426e42d38619bf91763b35de657cb4f85b8510aae3</citedby><cites>FETCH-LOGICAL-c413t-d4bd14f5122e42d0c967288426e42d38619bf91763b35de657cb4f85b8510aae3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0043135421010150$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34781184$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kim, Jin Hwi</creatorcontrib><creatorcontrib>Shin, Jae-Ki</creatorcontrib><creatorcontrib>Lee, Hankyu</creatorcontrib><creatorcontrib>Lee, Dong Hoon</creatorcontrib><creatorcontrib>Kang, Joo-Hyon</creatorcontrib><creatorcontrib>Cho, Kyung Hwa</creatorcontrib><creatorcontrib>Lee, Yong-Gu</creatorcontrib><creatorcontrib>Chon, Kangmin</creatorcontrib><creatorcontrib>Baek, Sang-Soo</creatorcontrib><creatorcontrib>Park, Yongeun</creatorcontrib><title>Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method</title><title>Water research (Oxford)</title><addtitle>Water Res</addtitle><description>•The algal alert system is used worldwide for proactive management.•Alert levels of algal bloom were classified using two machine learning models.•Imbalanced class data induced biased training of the machine learning models.•Synthetic data by ADASYN increased the performance of the machine learning models. Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years’ worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir. [Display omitted]</description><subject>ADASYN</subject><subject>Alert level</subject><subject>Early warning</subject><subject>Environmental Monitoring</subject><subject>Harmful Algal Bloom</subject><subject>Harmful algal blooms</subject><subject>Machine Learning</subject><subject>Neural Networks, Computer</subject><subject>Water Quality</subject><issn>0043-1354</issn><issn>1879-2448</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kc9u1DAQxi0EotvCGyDkI5cs_pfEuSChqkClSlzgbDn2pOuVHQc72WpfgafGIUuPnEae-c188vch9I6SPSW0-XjcP-k5Qd4zwuie0lYy-gLtqGy7igkhX6IdIYJXlNfiCl3nfCSEMMa71-iKi1ZSKsUO_b4PU4onNz7i-QB4gjTEFPRoAMcBB20ObgTsQadxZUK04DMuDC4tf8ZPl0GBDzqFYfFY-0ftce9jDBkveZ3qEWurp9mdAOfzWJRmZ3DWYfJ_r8J8iPYNejVon-Htpd6gn1_uftx-qx6-f72__fxQGUH5XFnRWyqGmjIGglliuqZlUgrWrE8uG9r1Q0fbhve8ttDUrenFIOte1pRoDfwGfdjulo__WiDPKrhswHs9QlyyYnUnSVt3jBdUbKhJMecEg5qSCzqdFSVqTUEd1ZaCWlNQWwpl7f1FYekD2Oelf7YX4NMGFDPh5CCpbBwU061LYGZlo_u_wh8Ug5z6</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Kim, Jin Hwi</creator><creator>Shin, Jae-Ki</creator><creator>Lee, Hankyu</creator><creator>Lee, Dong Hoon</creator><creator>Kang, Joo-Hyon</creator><creator>Cho, Kyung Hwa</creator><creator>Lee, Yong-Gu</creator><creator>Chon, Kangmin</creator><creator>Baek, Sang-Soo</creator><creator>Park, Yongeun</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20211201</creationdate><title>Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method</title><author>Kim, Jin Hwi ; Shin, Jae-Ki ; Lee, Hankyu ; Lee, Dong Hoon ; Kang, Joo-Hyon ; Cho, Kyung Hwa ; Lee, Yong-Gu ; Chon, Kangmin ; Baek, Sang-Soo ; Park, Yongeun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-d4bd14f5122e42d0c967288426e42d38619bf91763b35de657cb4f85b8510aae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>ADASYN</topic><topic>Alert level</topic><topic>Early warning</topic><topic>Environmental Monitoring</topic><topic>Harmful Algal Bloom</topic><topic>Harmful algal blooms</topic><topic>Machine Learning</topic><topic>Neural Networks, Computer</topic><topic>Water Quality</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Jin Hwi</creatorcontrib><creatorcontrib>Shin, Jae-Ki</creatorcontrib><creatorcontrib>Lee, Hankyu</creatorcontrib><creatorcontrib>Lee, Dong Hoon</creatorcontrib><creatorcontrib>Kang, Joo-Hyon</creatorcontrib><creatorcontrib>Cho, Kyung Hwa</creatorcontrib><creatorcontrib>Lee, Yong-Gu</creatorcontrib><creatorcontrib>Chon, Kangmin</creatorcontrib><creatorcontrib>Baek, Sang-Soo</creatorcontrib><creatorcontrib>Park, Yongeun</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Water research (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Jin Hwi</au><au>Shin, Jae-Ki</au><au>Lee, Hankyu</au><au>Lee, Dong Hoon</au><au>Kang, Joo-Hyon</au><au>Cho, Kyung Hwa</au><au>Lee, Yong-Gu</au><au>Chon, Kangmin</au><au>Baek, Sang-Soo</au><au>Park, Yongeun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method</atitle><jtitle>Water research (Oxford)</jtitle><addtitle>Water Res</addtitle><date>2021-12-01</date><risdate>2021</risdate><volume>207</volume><spage>117821</spage><epage>117821</epage><pages>117821-117821</pages><artnum>117821</artnum><issn>0043-1354</issn><eissn>1879-2448</eissn><abstract>•The algal alert system is used worldwide for proactive management.•Alert levels of algal bloom were classified using two machine learning models.•Imbalanced class data induced biased training of the machine learning models.•Synthetic data by ADASYN increased the performance of the machine learning models. Many countries have attempted to monitor and predict harmful algal blooms to mitigate related problems and establish management practices. The current alert system-based sampling of cell density is used to intimate the bloom status and to inform rapid and adequate response from water-associated organizations. The objective of this study was to develop an early warning system for cyanobacterial blooms to allow for efficient decision making prior to the occurrence of algal blooms and to guide preemptive actions regarding management practices. In this study, two machine learning models: artificial neural network (ANN) and support vector machine (SVM), were constructed for the timely prediction of alert levels of algal bloom using eight years’ worth of meteorological, hydrodynamic, and water quality data in a reservoir where harmful cyanobacterial blooms frequently occur during summer. However, the proportion imbalance on all alert level data as the output variable leads to biased training of the data-driven model and degradation of model prediction performance. Therefore, the synthetic data generated by an adaptive synthetic (ADASYN) sampling method were used to resolve the imbalance of minority class data in the original data and to improve the prediction performance of the models. The results showed that the overall prediction performance yielded by the caution level (L1) and warning level (L2) in the models constructed using a combination of original and synthetic data was higher than the models constructed using original data only. In particular, the optimal ANN and SVM constructed using a combination of original and synthetic data during both training (including validation) and test generated distinctively improved recall and precision values of L1, which is a very critical alert level as it indicates a transition status from normalcy to bloom formation. In addition, both optimal models constructed using synthetic-added data exhibited improvement in recall and precision by more than 33.7% while predicting L-1 and L-2 during the test. Therefore, the application of synthetic data can improve detection performance of machine learning models by solving the imbalance of observed data. Reliable prediction by the improved models can be used to aid the design of management practices to mitigate algal blooms within a reservoir. [Display omitted]</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>34781184</pmid><doi>10.1016/j.watres.2021.117821</doi><tpages>1</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0043-1354
ispartof Water research (Oxford), 2021-12, Vol.207, p.117821-117821, Article 117821
issn 0043-1354
1879-2448
language eng
recordid cdi_proquest_miscellaneous_2598075923
source MEDLINE; Elsevier ScienceDirect Journals
subjects ADASYN
Alert level
Early warning
Environmental Monitoring
Harmful Algal Bloom
Harmful algal blooms
Machine Learning
Neural Networks, Computer
Water Quality
title Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T16%3A29%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20the%20performance%20of%20machine%20learning%20models%20for%20early%20warning%20of%20harmful%20algal%20blooms%20using%20an%20adaptive%20synthetic%20sampling%20method&rft.jtitle=Water%20research%20(Oxford)&rft.au=Kim,%20Jin%20Hwi&rft.date=2021-12-01&rft.volume=207&rft.spage=117821&rft.epage=117821&rft.pages=117821-117821&rft.artnum=117821&rft.issn=0043-1354&rft.eissn=1879-2448&rft_id=info:doi/10.1016/j.watres.2021.117821&rft_dat=%3Cproquest_cross%3E2598075923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2598075923&rft_id=info:pmid/34781184&rft_els_id=S0043135421010150&rfr_iscdi=true