Supervised learning algorithms in the classification of plant populations with different degrees of kinship

The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologie...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Brazilian Journal of Botany 2021-06, Vol.44 (2), p.371-379
Hauptverfasser:	Skowronski, Leandro, de Moraes, Paula Martin, de Moraes, Mario Luiz Teixeira, Gonçalves, Wesley Nunes, Constantino, Michel, Costa, Celso Soares, Fava, Wellington Santos, Costa, Reginaldo B.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Analysis Bayesian analysis Biomedical and Life Sciences Botany Classification Confidence intervals Data mining Decision trees Discriminant analysis Evaluation Flowers & plants Genetic diversity Genetics & Evolutionary Biology - Short Communication Learning algorithms Life Sciences Machine learning Multilayers Neural networks Plant populations Plant Systematics/Taxonomy/Biogeography Population genetics Population studies Populations Similarity Statistical analysis Supervised learning Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity—comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.
ISSN:	0100-8404 1806-9959
DOI:	10.1007/s40415-021-00703-1