Computational protocol: Genetic Variants That Confer Resistance to Malaria Are Associated with Red Blood Cell Traits in African-Americans: An Electronic Medical Record-based Genome-Wide Association Study

Similar protocols

Protocol publication

[…] When multiple measurements of a RBC trait were available for an individual patient, we chose the median value and the corresponding age for the genetic analyses. We performed association analyses by using linear regression implemented in PLINK (), assuming additive genetic effects, with adjustment for age, sex, site, and for any population substructure (i.e., the first two principal components [PCs]). We adjusted for genetic ancestry via the first two PCs generated by principal component analysis (PCA), a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (n × m matrix, i.e., sample × genotypes matrix) into a set of values of linearly uncorrelated variables called PCs. For SNPs in the X chromosome, alleles A and B were coded (A → 0; and B → 1) in males and (AA → 0; AB → 1, and BB → 2) in females, and additionally sex was included as a covariate. We estimated the regression coefficient (β), and the proportion of variance in a RBC trait explained by a variant (i.e., R2). The statistical power for a sample size of 1904 in the discovery cohort to detect a quantitative trait locus that explained ~2% variance in a RBC trait, was 80% at a significance level of 5×10−8.Given the known correlation between RBC traits, we performed PCA to identify the main vectors along which the RBC traits lie. These vectors were then used as phenotypes in the association analyses, with adjustments being the same as for the single SNP analyses.Because RBC traits can be affected by a wide array of medical conditions, we also performed analyses in the subset of patients (n = 2005) in whom relevant comorbid conditions, specific medications, and blood loss were absent. To do this, we employed a previously developed algorithm based on billing codes and natural language processing of unstructured clinical notes to exclude RBC traits values affected by comorbidities, medications, or blood loss (; , ). The phenotyping algorithm is available online (, and the values of RBC traits before and after implementing the algorithm are provided in Supporting Information, Table S1).Patterns of linkage disequilibrium (LD) were analyzed based on HapMap Phase II YRI for chromosome X () and the 1000 Genome YRI for autosomal chromosomes () via the LocusZoom software (). Map and pedigree files were not available for chromosome X in the 1000 Genome YRI in the LocusZoom package. […]

Pipeline specifications

Software tools PLINK, LocusZoom
Application GWAS
Organisms Homo sapiens
Diseases Malaria