Training set optimization of genomic prediction by means of EthAcc

Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2019-02, Vol.14 (2), p.e0205629-e0205629
Hauptverfasser:	Mangin, Brigitte, Rincent, Renaud, Rabier, Charles-Elie, Moreau, Laurence, Goudemand-Dugue, Ellen
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Analysis Animal behavior Animal breeding Beta vulgaris - genetics Biology and Life Sciences Comparative analysis Computer Simulation Corn Engineering and Technology Environmental Sciences Genetics Genome Genome-wide association studies Genome-Wide Association Study Genomes Genomics Genomics - methods Genotype Helianthus - genetics Life Sciences Mathematical models Medical research Models, Genetic Optimization Optimization theory Phenotype Physical Sciences Plant breeding Plant Breeding - methods Population Quantitative genetics Quantitative Trait Loci Research and Analysis Methods Sugar beets Training Triticum - genetics Vegetal Biology Wheat Zea mays - genetics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc's precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0205629