A data-driven binary-classification framework for oil fingerprinting analysis

A marine oil spill is one of the most challenging environmental issues, resulting in severe long-term impacts on ecosystems and human society. Oil dispersants are widely applied as a treating agent in oil spill response operations. The usage of dispersants significantly changes the behaviors of disp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental research 2021-10, Vol.201, p.111454-111454, Article 111454
Hauptverfasser: Chen, Yifu, Chen, Bing, Song, Xing, Kang, Qiao, Ye, Xudong, Zhang, Baiyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A marine oil spill is one of the most challenging environmental issues, resulting in severe long-term impacts on ecosystems and human society. Oil dispersants are widely applied as a treating agent in oil spill response operations. The usage of dispersants significantly changes the behaviors of dispersed oil and consequently challenges the oil fingerprinting analysis. In this study, machine learning was first introduced to analyze oil fingerprinting by developing a data-driven binary classification framework. The modeling integrated dimensionality reduction algorithms (e.g., principal component analysis, PCA) to distinguish. Five groups of biomarkers, including terpanes, steranes, triaromatic steranes (TA-steranes), monoaromatic steranes (MA-steranes), and diamantanes, were selected. Different feature spaces were created from the diagnostic index of biomarkers, and six ML algorithms were applied for comparative analysis and optimizing the modeling process, including k-nearest neighbor (KNN), support vector classifier (SVC), random forest classifier (RFC), decision tree classifier (DTC), logistic regression classifier (LRC), and ensemble vote classifier (EVC). Hyperparameter optimization and cross-validation through GridSearchCV were applied to prevent overfitting and increase the model accuracy. Model performance was evaluated by model score and F-score through confusion matrices. The results indicated that the RFC algorithm from the diamantanes dataset performed the best. It delivered the highest F-score (0.871) versus the lowest F-score (0.792) from the EVC algorithm from the TA-steranes dataset by PCA with a variance of 95%. Therefore, diamantanes were recommended as the most suitable biomarker for distinguishing WCO and CDO to aid oil fingerprinting under the conditions in this study. The results proved the proposed method as a potential analysis tool for oil spill source identification through ML-aided oil fingerprinting. The study also showed the value of ML methods in oil spill response research and practice. •Machine Learning was introduced to aid oil fingerprinting to distinguish weathered and chemically dispersed crude oil.•Six machine learning methods integrated dimensionality reduction algorithms were employed for comparative analysis.•Diamantanes were recommended as the most suitable biomarker in the study case by modeling performance.•The method was proved with feasibility in oil spill source identification by availing of the strength of mach
ISSN:0013-9351
1096-0953
DOI:10.1016/j.envres.2021.111454