Computational protocol: Investigation of genetic variation and lifestyle determinants in vitamin D levels in Arab individuals

Similar protocols

Protocol publication

[…] The effects of various parameters were investigated using linear regression analyses as implemented in the R environment. Vitamin D levels of all subjects were adjusted for sex, age, number of years in Kuwait, triglyceride levels, total cholesterol levels, and hip circumference. The residuals were normalized using the rntransform function implemented in the GenABEL package [] that performs quantile normalization of residuals from a generalized linear model analysis.In addition, conditional inference-based recursive partitioning (implemented in the R “party” package) [] was used to determine the influence of the genetic variables on normalized vitamin D levels (as a quantitative trait). This approach searched predictor variables having main effects and higher-order interactions. Stepwise modeling and splitting was applied to produce a classification tree that showed how each genotype affected vitamin D levels. This approach was divided into two phases: in the first phase, we analyzed each gene separately, highlighting which single-nucleotide polymorphism (SNP) had the largest effect on vitamin D levels. In the second phase, we constructed a regression tree using only the SNPs identified as significant in the phase 1 analyses after Bonferroni correction. The final tree is based on the splitting variables of each node with the highest statistical significance. Regression trees allows to construct an hierarchy of the variables with the first node corresponding to the variable with the strongest effect, the second and the third nodes correspond to variables with significant impact of the phenotype. Recursive partitioning is widely used in medicine and genetics to describe interaction between variables.The genotype distributions and derived allele frequencies of the SNPs of interest were compared with similar distributions from the reference dataset of 1000 Genomes Project. The distribution of the combined annotation-dependent depletion (CADD) score for each variant was obtained []. […]

Pipeline specifications

Software tools GenABEL, CADD
Applications WGS analysis, GWAS
Chemicals Vitamin D