Computational protocol: Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs

Similar protocols

Protocol publication

[…] We applied PCA, a Bayesian clustering approach, as implemented in the program EIGENSOFT and STRUCTURE (v2.2) , respectively, and Fst to investigate potential population structure. We used Wright’s population differentiation estimator Fst to ensure ploidy independence. To run the STRUCTURE program, we applied the same conditions described previously . Briefly, ten runs of 50,000 burn-ins and 100,000 iterations were performed for K=1 to 10 using the admixture model. For PCA, we used the LD correction and calculated the top 10 eigenvectors or principal components (PCs) from the genotypes of the African, Asian, and American populations. We identified and removed isolates that were greater than 6 standard deviations from the PC mean along any of the top 5 PCs and repeated the PCA calculation and outlier detection for 10 iterations. [...] The individual populations were analyzed for association to the seven antimalarial drugs using EigenstratQTL in the EIGENSOFT program, utilizing PCA to control for population structure within the populations. Population structure was corrected using three, one, and zero significant PCs in the PCA for the Asian, African, and American populations, respectively. The correction is a function of sample position and the regression of genotypes at PC position for that sample, which adjusted genotypes and phenotypes and effectively eliminated population structure within each individual population. The correction for the genotype of sample i at SNP j is: gij,adjusted=gij−yiajyi=∑jajgij∑jaj2 Where aj is the ancestry/position of individual j in the PC.Test statistic is (N − K)* correlation (corrected genotypes, corrected phenotypes)^2, where N = number of isolates (N = 133), and K = number of PCs used for correction (K = 3). The correlation between corrected genotypes and corrected phenotypes were obtained with the top 3 PC’s as fixed effects. Nominal P-values were determined using the Chi-sq distribution, df =1. Bonferroni P-values were determined as 1-(1-nominal P-value)^number of successful tests.Association analysis was also performed using software PLINK . Because PLINK does not have PCA correction within its test, population outliers from PCA analysis (those outside the circle in ) were removed before association analysis. A linear regression was fitted to test for each SNP for its association with in vitro IC50 values of the seven antimalarial drugs. Significant SNPs (P<0.05) were determined after Bonferroni correction. Quantile-Quantile plots for both methods were obtained by contrasting uncorrected and corrected (if applicable) experimental P value distributions to the expected uniform 0 to 1 distribution. […]

Pipeline specifications

Applications Population genetic analysis, GWAS
Organisms Plasmodium falciparum
Diseases Malaria