Computational protocol: Systematic Analysis of Circadian Genes in a Population-Based Sample Reveals Association of TIMELESS with Depression and Sleep Disturbance

Similar protocols

Protocol publication

[…] Genomic DNA was isolated from peripheral blood leukocytes using a standard EDTA extraction procedure . The circadian genes we chose were based on the literature (see ). SNPs within these genes were selected using the International HapMap database (www.hapmap.org) (see ). CEPH (Centre d'Etude du Polymorphisme Humain) genotype data of the International HapMap Project were referred to in order to cover the haplotype tagging SNPs (tagSNPs), International HapMap Consortium . We implemented the pairwise tagging method with an r2 threshold of 0.8 and minor allele frequency (MAF) of 0.1. For large genes (tagging SNP number >50) such as NPAS2 and RORA, tagSNPs were selected evenly spaced throughout the gene. The flanking regions of the DNA sequences were derived from SNPper . The extension primers for polymerase chain reaction were designed with MassARRAY Assay Design 3.1 software (Sequenom Inc., San Diego, CA, USA).SNP genotyping was performed using MassARRAY iPLEX Gold platform (Sequenom Inc.) following the manufacturer's guidelines in 24- to 34-plex reactions in 384-well plates using a total reaction volume of 5 µl including 12.5 ng of genomic DNA. The qualities of genotypes were analyzed using MassARRAY Typer 4.0 software (Sequenom Inc.) and verified manually. As quality controls, eight duplicated DNA samples and eight water controls were included in each plate. The overall average genotyping success rate for the SNP data was ≥95%, and MAF was ≥5%.Hardy-Weinberg equilibrium was monitored using Haploview version 4.1 , and a cutoff of p<0.05 was applied. Seven SNPs [rs17374292 (PER3); rs6722909, rs12712085 (NPAS2); rs7950226, rs2278749 (ARNTL); rs7137588, rs17413842 (ARNTL2)] failed the Hardy-Weinberg equilibrium test and were excluded from further analyses.The second study sample was genotyped with Illumina 610 K platform (Illumina Inc. San Diego, CA, USA). The call rate was >95% both for individuals and markers. The markers with MAF <1% or Hardy-Weinberg p <1×10−6 had been excluded. [...] We compared the allele frequencies between cases and controls using chi-square tests as implemented in the PLINK software package, web-based version 1.06 (http://pngu.mgh.harvard.edu/purcell/plink/) . To exclude possible false-positive results, PLINK's max (T) permutation test with 10,000 permutations was used to generate empirical p-values and for multiple testing correction. Power calculations show that our sample was powered () to detect associations for variants that increase risk depression ≥1.5 fold in females and ≥1.8 fold in males at 〈 = 0.05 and ® = 0.80 and ≥2.0 fold in females and ≥2.7 fold in males at 〈 = 0.05/(2×113×3)  = 0.000073 threshold level for statistical significance when taken into account the number of tests and ® = 0.80.In the single-locus analysis, we compared the following groups: (1) all depressed patients against all controls (D+ vs. D–), (2) depressed patients with early morning awakening against controls without early morning awakening (D+EMA+ vs. D–EMA–), and (3) depressed patients with fatigue against controls without fatigue (D+FAT+ vs. D–FAT–). To check gender-dependent and symptom-specific differences in the genetic background of depression, females and males were analyzed separately.Subsequently, we performed a descriptive analysis and compared allelic frequencies of gene variants that gave any evidence for an association (P<0.05, not corrected for multiple testing). The following non-overlapping groups were analyzed: (1) D+EMA–FAT– (n = 41 females and 16 males), (2) D+EMA+FAT+ (n = 94 females and 58 males), (3) D+EMA+FAT– (n = 15 females and 3 males; owing to the small number of males in this group, we did not examine their allelic frequencies), (4) D+EMA–FAT+ (n = 91 females and 33 males), and (5) controls, D–EMA–FAT– (n = 578 females and 481 males).Factors 1 and 2 from GSS were analyzed using linear regression models including age and affection status as covariates. This model was constructed separately for all females (n = 967) and all males (n = 687). We also implemented similar type of analyses for second sample set in which the status for metabolic disorder as well as age were the covariates.We also used Haploview (V.4.1) to determine the pair-wise LD structure for all genotyped variations within each studied circadian gene . We then performed two-SNP, three-SNP and four-SNP haplotype association analyses by utilizing SNPs in genes giving an association of P<0.05 in the single-locus analyses of the first sample set.Furthermore, the logistic regression model, as implemented in the PLINK software package (V.1.06) , was used to investigate interaction of gender between variants of most significantly associated gene and study phenotypes, also used to SNP-SNP interaction analyses for SNPs that gave significant results when comparing D+FAT+ females vs. D–FAT– females, and D+EMA+ males vs. D–EMA– males.Finally, to search for transcription factor binding sites within a particular TIMELESS gene that was differentially associated with both genders for depression and sleep-related problems, we implemented the tool ConSite, a platform-independent web resource . The corresponding regulatory regions of human (ENSG00000111602) and target mouse (ENSMUSG00000039994) were retrieved using a genome browser such as EnsEMBL (www.ensembl.org), and the retrieved orthologous pairs of genomic sequences were re-aligned using the ORCA aligner . We then examined the transcription factor binding sites shared by this gene. Only vertebrate transcription factors, with a specificity of minimum 10 bits and a TF score threshold of 90%, in parts of the sequences presenting a minimum conservation of 90% between the species. […]

Pipeline specifications