Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa identification of freshwater macroinvertebrates

Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time‐consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the q...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmetrics (London, Ont.) Ont.), 2013-06, Vol.24 (4), p.248-259
Hauptverfasser: Ärje, J., Kärkkäinen, S., Turpeinen, T., Meissner, K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time‐consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel approach; a score of weighted votes. Besides modifying the voting, we propose to weight features according to their importance instead of eliminating the least important features. We compared the performance of RBA with traditional Bayesian and several other popular classification methods and assessed how the methods behave in relation to each other and the different macroinvertebrate species. Further, we investigate how severely misclassifications affect the performance of different methods when set into a biomonitoring context. We found that the lowest and least severe classification error (i.e. most accurate taxa identification) was achieved with RBA by using averaged posterior probabilities and weighted features. Copyright © 2013 John Wiley & Sons, Ltd.
ISSN:1180-4009
1099-095X
DOI:10.1002/env.2208