Leveraging LASSO-Based Methodologies for Enhanced SNP Analysis in Plant Genomes

Genome-wide association studies (GWAS) have been widely used to reveal the associations between genetic variations and phenotypes in a population of individuals. However, they have been criticized for missing important genetic markers usually due to the fact that the data may not fit the statistical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics advances 2025-02
Hauptverfasser: Puthiyedth, Nisha, Zeinalinesaz, Farshad, Hou, Dongdong, Zhang, Yue, Lin, Wenjun, Yan, Yan
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Genome-wide association studies (GWAS) have been widely used to reveal the associations between genetic variations and phenotypes in a population of individuals. However, they have been criticized for missing important genetic markers usually due to the fact that the data may not fit the statistical models well. In this study, we address the challenge of identifying significant single nucleotide polymorphisms (SNPs) in GWAS by harnessing the capabilities of two sophisticated regression models, BIGLASSO and AUTALASSO. They are both variants of the least absolute shrinkage and selection operator (LASSO). Our research contributes to the field of genomics through detailed comparative analysis of Arabidopsis thaliana, revealing how each method specializes in uncovering SNPs for different trait types. Our findings indicate that BIGLASSO shows stronger alignment with GWAS results, particularly excelling in the analysis of binary traits, even when these are derived from categorical phenotypes. AUTALASSO could be effective for quantitative traits and complement GWAS. We demonstrate that these LASSO-based methods can significantly enhance the identification of genetic markers, offering a potent complement to traditional GWAS approaches. Our findings not only bridge the gap between statistical and machine learning methodologies in genetic studies but also provide a practical framework for researchers seeking to validate reported SNPs or explore new genomic regions for trait association. This work stands as a pivotal step towards the integration of advanced computational techniques in genomics, paving the way for more precise and comprehensive genetic analyses.
ISSN:2635-0041
2635-0041
DOI:10.1093/bioadv/vbaf014