Single nucleotide polymorphism calling and imputation strategies for cost‐effective genotyping in a tropical maize breeding program

Genotyping‐by‐sequencing (GBS) datasets typically feature high rates of missingness and heterozygote undercalling, prompting the use of data imputation. We compared the accuracy of four imputation methods—NPUTE, Beagle, k‐nearest neighbors imputation (KNNI), and fast inbreed line library imputation...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Crop science 2020-11, Vol.60 (6), p.3066-3082
Hauptverfasser: Oliveira, Amanda Avelar, Guimarães, Lauro José Moreira, Guimarães, Claudia Teixeira, Guimarães, Paulo Evaristo de Oliveira, Pinto, Marcos de Oliveira, Pastina, Maria Marta, Margarido, Gabriel Rodrigues Alves
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Genotyping‐by‐sequencing (GBS) datasets typically feature high rates of missingness and heterozygote undercalling, prompting the use of data imputation. We compared the accuracy of four imputation methods—NPUTE, Beagle, k‐nearest neighbors imputation (KNNI), and fast inbreed line library imputation (FILLIN)—using GBS data of maize (Zea mays L.) inbred lines, genotyped using different multiplexing levels. Two strategies for SNP‐calling and genotype imputation were evaluated. First, only lines genotyped through 96‐plex were used for single nucleotide polymorphism (SNP) discovery, whereas both 96‐ and 384‐plex were simultaneously used in the second strategy. In the first genotype imputation strategy, only the 96‐plex lines were imputed, then the remaining lines were appended (96‐plex‐imputed plus 384‐plex) and then imputed. In the second imputation strategy, we jointly imputed both datasets. We also investigated the impacts of including heterozygous genotypes and distinct rates of missing genotypes per locus. The different SNP‐calling strategies and percentage of missing data did not substantially affect the imputation accuracy. However, the different imputation strategies showed a substantial effect. Generally, imputations were less accurate for heterozygotes. The scenario 96‐plex‐imputed plus 384‐plex showed accuracies similar to the 96‐plex scenario. Beagle and NPUTE produced the highest accuracies. Our results indicate that combining SNP‐calling and imputation strategies can enhance genotyping in a cost‐effective manner, resulting in higher imputation accuracies.
ISSN:0011-183X
1435-0653
DOI:10.1002/csc2.20255