A freshwater algae classification system based on machine learning with StyleGAN2-ADA augmentation for limited and imbalanced datasets

•Overcame challenge of limited and imbalanced data in automated algae classification.•Image augmentation by StyleGAN2-ADA notably improved algae classification models.•Improvement of 16% in accuracy and 21% in F1-score for rare algae classification.•Usage of generated algal images for training model...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Water research (Oxford) 2023-09, Vol.243, p.120409-120409, Article 120409
Hauptverfasser: Chan, Wang Hin, Fung, Benjamin S.B., Tsang, Danny H.K., Lo, Irene M.C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Overcame challenge of limited and imbalanced data in automated algae classification.•Image augmentation by StyleGAN2-ADA notably improved algae classification models.•Improvement of 16% in accuracy and 21% in F1-score for rare algae classification.•Usage of generated algal images for training models effectively and efficiently. Automated algae classification using machine learning is a more efficient and effective solution compared to manual classification, which can be tedious and time-consuming. However, the practical application of such a classification approach is restricted by the scarcity of labeled freshwater algae datasets, especially for rarer algae. To overcome these challenges, this study proposes to generate artificial algal images with StyleGAN2-ADA and use both the generated and real images to train machine-learning-driven algae classification models. This approach significantly enhances the performance of classification models, particularly in their ability to identify rare algae. Overall, the proposed approach improves the F1-score of lightweight MobileNetV3 classification models covering all 20 freshwater algae covered in this research from 88.4% to 96.2%, while for the models that cover only the rarer algae, the experiments show an improvement from 80% to 96.5% in terms of F1-score. The results show that the approach enables the trained algae classification systems to effectively cover algae with limited image data. [Display omitted]
ISSN:0043-1354
1879-2448
DOI:10.1016/j.watres.2023.120409