Computational protocol: Genome-Wide Association Analysis of Radiation Resistance in Drosophila melanogaster

Similar protocols

Protocol publication

[…] We treated radiation response phenotype as a binary outcome – resistant and sensitive – since 60% of the data had numerical values of zero (). A subset of resistant lines was highly variable (see Results). However, variability did not affect our data analysis because variable lines exhibited a mean resistant phenotype after several replications. Similarly, although the temporal phenotypic stability analysis also showed variability in highly resistant lines, no highly resistant line became sensitive. Hence, variability in the resistance phenotype did not alter the number of cases and controls for association analysis.There were a total of 5,066,519 SNPs in the DGRP freeze 1 dataset. Genotype data were cleaned by the following criteria: genotype missingness <15%, heterozygous haploids and genotype call rate >10% and minor allele frequency >1%. There were 2,035,449 SNPs filtered out by quality controls.Potential confounding factors - Wolbachia infection status, population stratification, and cryptic relatedness – were considered. Wolbachia infection status had no significant effect (). To examine whether population structure is an influential confounder, we first derived the top five principal components using all the 2,035,449 SNPs by GCTA ; then we tested association between radio resistance and each principal component. The results suggested population structure not a significant confounder (). To thoroughly control for population stratification and cryptic relatedness, we employed a linear mixed model that uses the whole-genome data to estimate the genetic relationship matrix, as well as the top five principal components as covariates . In sum, the association of radiation response in 154 lines with 3,030,570 SNPs was examined by the likelihood ratio test fitting a linear mixed model using GEMMA .To determine the threshold for genome-wide significance, we calculated the number of haplotype blocks in the DGRP genome. The haplotype block is defined as a window of SNPs with the outer-most marker required to be in strong linkage disequilibrium (LD) with an upper limit of 90% confidence interval exceeding 0.98 and a lower limit of 90% confidence interval exceeding 0.7 . The calculation was performed using PLINK . Note that the pairwise LD was only calculated for SNPs within 500 kb; thus the number of LD blocks obtained would be an upper limit. The narrow sense heritability from additive genetic effects was estimated by a liability threshold model using the whole-genome variation . We also estimated heritability by fitting traditional linear mixed models modelling line effects. […]

Pipeline specifications

Software tools GCTA, GEMMA, PLINK
Application GWAS
Organisms Drosophila melanogaster, Homo sapiens
Diseases Neoplasms, Drug-Related Side Effects and Adverse Reactions