Data imputation and machine learning improve association analysis and genomic prediction for resistance to fish photobacteriosis in the gilthead sea bream

•2bRAD data imputation yielded larger SNP data set and allowed the identification of more QTLs for disease resistance.•Machine learning-based phenotype prediction showed higher performance than Bayesian regression methods.•Higher classification performance might be due to non-additive effects. Disea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Aquaculture reports 2021-07, Vol.20, p.100661, Article 100661
Hauptverfasser: Bargelloni, Luca, Tassiello, Oronzo, Babbucci, Massimiliano, Ferraresso, Serena, Franch, Rafaella, Montanucci, Ludovica, Carnier, Paolo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•2bRAD data imputation yielded larger SNP data set and allowed the identification of more QTLs for disease resistance.•Machine learning-based phenotype prediction showed higher performance than Bayesian regression methods.•Higher classification performance might be due to non-additive effects. Disease resistance represents a key trait for breeding programs in aquaculture species. Here we re-analysed 2bRAD sequence data from two experimental challenges of gilthead sea bream with Photobacterium damsealae piscicida. Using a high quality reference genome, we carried out variant calling and data imputation with Beagle to obtain a large set of SNPs (80,744). This allowed the identification of eight novel QTLs for resistance to photobacteriosis across different chromosomes and revealed a highly polygenic genetic architecture. Bayesian regression approaches and machine learning methods (support vector machines and linear bagging) were compared to evaluate relative performance to classify susceptible-resistant individuals. Both data sets showed higher Matthew Correlation Coefficient (MCC) and accuracy values for machine learning methods, particularly linear bagging, with 20–70 % increase in prediction performance. Overall, machine learning methods should be explored in parallel with parametric regression approaches to increase the chances of highly effective genomic prediction.
ISSN:2352-5134
2352-5134
DOI:10.1016/j.aqrep.2021.100661