Computational protocol: GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person

Similar protocols

Protocol publication

[…] For our standard GWAS, we restricted participants to a set of individuals who have >97% European ancestry, as determined through an analysis of local ancestry via comparison to the three HapMap 2 populations. A maximal set of unrelated individuals was chosen for the analysis using a segmental identity-by-descent estimation algorithm. Individuals were defined as related if they shared >700 cM identity-by-descent, including regions where the two individuals share either one or both genomic segments identical-by-descent. This level of relatedness (roughly 20% of the genome) corresponds approximately to the minimal expected sharing between first cousins in an outbred population.Participant genotype data were imputed against the August 2010 release of 1,000 Genomes reference haplotypes. First, we used Beagle (version 3.3.1) to phase batches of 8,000–9,000 individuals across chromosomal segments of no >10,000 genotyped SNPs, with overlaps of 200 SNPs. We excluded SNPs with minor allele frequency<0.001, Hardy–Weinberg equilibrium P<10−20, call rate<95%, or with large allele frequency discrepancies compared with the 1,000 Genomes reference data. We identified the discrepancies by computing a 2 × 2 table of allele counts for the European 1,000 Genomes samples and 2,000 randomly sampled 23andMe customers with European ancestry and excluded SNPs with χ2 test P value <10−15. We then assembled full-phased chromosomes by matching the phase of haplotypes across the overlapping segments. We imputed each batch against the European subset of 1,000 Genomes haplotypes using Minimac (2011-10-27), using five rounds and 200 states for parameter estimation.For the non-pseudoautosomal region of the X chromosome, males and females were phased together in segments, treating the males as already phased; the pseudoautosomal regions were phased separately. We assembled fully phased X chromosomes, representing males as homozygous pseudo-diploids for the non-pseudoautosomal region. We then imputed males and females together using Minimac as with the autosomes.For morning and night person comparisons, we computed association test results by logistic regression assuming additive allelic effects. For tests using imputed data, we used the imputed dosages rather than best-guess genotypes. We used covariates age, gender, and the top five PC to account for residual population structure. The GWAS association test P values were computed using a likelihood ratio test. Results for the X chromosome are computed similarly, with men coded as if they were homozygous diploid for the observed allele.Imputed results were computed for 7,381,496 SNPs having an average imputation r2>0.5 and a minimum within-batch r2>0.3, and removing SNPs with evidence of a strong batch effect (P<10−50), measured by ANOVA of dosages versus batches. For genotyped SNPs, we identified 854,959 SNPs with a minor allele frequency >0.1%, call rate >90%, Hardy–Weinberg P>10−20 in European 23andMe participants and P>10−50 for an effect of genotyping date on allele frequency. To create a single merged result set, for 806,041 SNPs with both imputed and genotyped results passing these quality filters, we selected the imputed result. After applying these filters and removing a small number of results that did not converge, we were left with association test results for 7,427,422 SNPs. […]

Pipeline specifications

Software tools BEAGLE, minimac
Application GWAS
Organisms Homo sapiens
Diseases Sleep Initiation and Maintenance Disorders