Computational protocol: Common genetic variants, acting additively, are a major source of risk for autism

Similar protocols

Protocol publication

[…] DNA samples from SSC and AGP family members genotyped on the Illumina Infinium® 1Mv3 (duo) microarray or the Illumina Infinium® 1Mv1 microarray were analyzed here. Specifically qualifying SSC samples were genotyped on the Illumina Infinium® 1Mv3 (duo) microarray (71.8%) while most AGP samples were genotyped on the Illumina Infinium® 1Mv1 microarray (98.7%). Both arrays genotype roughly 1,000,000 single nucleotide polymorphisms (SNPs) and the overlap between the SNP sets is almost perfect.The SSC sample [] includes >2,000 genotyped families. However, our analyses targeted a homogeneous subset of these data. First, we included only samples genotyped on an Illumina 1M array; families had to be ‘quads’ consisting of an unaffected mother and father, an affected proband and an unaffected sibling; and all members of a quad had to have complete genotypes (>95% completion rate). Only samples of European ancestry were included. European ancestry for the SSC families was determined using GemTools [,] for all available SSC probands. To conduct the ancestry analysis we selected 5,156 SNPs with at least 99.9% calls for genotypes, had minor allele frequency MAF >0.05, and were at least 0.5 Mb apart. Individuals were clustered into nine ancestry groups based on four significant dimensions of ancestry. The central five clusters, which held a total of 1,686 families, were identified as being of European descent. The ancestry cluster information combined with complete genotype information yielded a total of 965 SSC families for the analysis.The AGP Stage 1 dataset [,] comprised 1,471 families, of which 1,141 were previously identified to be of European ancestry []. European ancestry was confirmed by analyses identical to those applied to the SSC families (see Additional file : Figure S1). [...] Heritability of ASD from probands versus controls was estimated using GCTA software [], which encodes the theory laid out in [,]. Prevalence of ASD was taken to be 1% []. For each of the analyses, Genetic Relationship Matrices (GRM) were determined for each of the 23 chromosomes using the --make-grm option in GCTA []. These were then combined in an overall matrix, using the --mgrm option in GCTA. The first 10 principal components of ancestry were determined using --pca in GCTA. These 10 PCA were then used as covariates for estimating the heritability using --reml in GCTA. A prevalence of 0.01 for autism spectrum disorders was used to transform the heritability on the observed scale to the heritability on the liability scale. [...] While 713,259 SNPs were used for primary analyses, they constitute a small fraction of the SNPs in the human genome. Hence the heritability presented could underestimate total heritability. On the other hand, because genotypes of SNPs in close proximity tend to be correlated due to linkage disequilibrium, it does not follow that the coverage of the genome by the SNPs used here estimate only a small fraction of the heritability. To determine the shortfall in “genomic coverage” and how it impacts estimates of heritability, we performed an experiment using data from the 1,000 Genomes project [], under the assumption that coverage of common variants in the 1,000 Genomes data is perfect. Assessing all SNPs genotyped in our data, as well as subsets thereof, we estimated heritability of liability. Using the same subsets, but in 1,000 Genomes subjects, we estimated levels of genomic coverage. We can then relate estimated heritability to genomic coverage to develop a functional relationship between the two.We performed the experiment assessing “genomic coverage” as follows. We assumed genomic coverage of SNPs with MAF > 0.1 would be essentially complete for the 379 European samples analyzed by the 1,000 Genomes project. From these genomes we selected 50 1Mb regions in which at least 500 SNPs in the 1,000 Genomes samples had MAF > 0.10. Coverage of these regions by the 713,259 SNPs was calculated as a function of the number of other SNPs with MAF > 0.1 that were tagged by (correlated with) them; call the set of M = 713,259 SNPs “tagSNP”. The tagging evaluation was implemented using Hclust []. Forcing tagSNP to be in the set of selected tag SNPs from the region, Hclust evaluated how many more independent SNPs N were required to cover the region when the minimum linkage disequilibrium [] r2 amongst tags could be no less than X, where X = {0.5, 0.7, and 0.9}. Then, for each value of X, M/(M+N) estimates the coverage. Next we randomly sampled 50, 25 and 12.5% of the 713,259 SNPs (356,630, 178,315, and 89,158 SNPs respectively) five times and each time estimated coverage for these subsets. […]

Pipeline specifications

Software tools GemTools, GCTA, SNPinfo, Hclust
Applications Population genetic analysis, GWAS
Organisms Homo sapiens