Genome-wide exploratory analysis for NARAC dataset with preparation for haplotype block partitioning through minor allele frequency quality control viewpoint

This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the perc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Iran Journal of Computer Science (Online) 2023-12, Vol.6 (4), p.387-396
Hauptverfasser: Saad, Mohamed N., Zareef, Galena W., Ibrahim, Fatma S., Said, Ashraf M., Hamed, Hisham F. A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the percentage of missing data. The number of alleles and genotypes is also counted, and the minor allele frequency (MAF) is calculated for each single nucleotide polymorphism (SNP). The data is further classified into four categories based on the SNP's MAF, namely, very rare, rare, low frequency, and common SNPs. The regions of these categories in the chromosome are investigated to determine the proportion of SNPs in coding locations and other regions. It is observed that each category has a different proportion in each region of consequence annotation. The data composition in terms of alleles and genotypes is found to be greatly disproportionate. The results present clear insights into the data and its MAF, which can be compared with other datasets. These findings can aid researchers in gaining a comprehensive understanding of such case–control datasets and bring accurate insights into the data.
ISSN:2520-8438
2520-8446
DOI:10.1007/s42044-023-00147-8