Computational protocol: Genetics of Callous-Unemotional Behavior in Children

Similar protocols

Protocol publication

[…] The sample was drawn from the Twins Early Development Study (TEDS), a multivariate longitudinal study which recruited over 11,000 twin pairs born in England and Wales in 1994, 1995 and 1996 , whose families are representative of the UK population . Twins with severe medical problems or severe birth complications or whose zygosity could not be determined were excluded from the sample. To decrease heterogeneity of ancestry, the sample was restricted to families who identified themselves as white and whose first language was English.In order to make our twin sample as comparable as possible to our GCTA and GWA samples, we selected those twin pairs for whom one member of the twin pair was chosen for the GCTA and GWA analyses. For GCTA and the discovery sample of the GWA analysis, we included unrelated individuals by selecting only one member of each twin pair for whom GWA genotyping and CU data were available. For the GCTA analysis, we verified that the unrelated individuals were less genetically related than fourth-degree relatives (genetic relatedness >.025), the standard GCTA exclusion criterion.Based on these selection criteria, our twin analyses included 1099 MZ pairs and 1787 DZ pairs. Our GCTA and GWA discovery sample included 2,930 children; the slightly smaller number of twin pairs was caused by twin pairs for whom the co-twin did not have CU data. [...] An index of information (Fisher) for the allele frequency at each of 932,533 called SNPs was calculated using SNPTEST version 2.1.1 . Autosomal SNPs were excluded if this information index was below 0.975, if the minor allele frequency was less than 1%, if greater than 2% of genotype data were missing, or if the Hardy Weinberg p-value was lower than 10−20. Association between the SNP and the plate on which samples were genotyped was calculated and SNPs with a plate effect p-value less than 10−6 were also excluded. In addition, SNPs were manually filtered for call quality by visual inspection of the hybridization intensity plots using EVOKER software (http://sourceforge.net/projects/evoker/). The above filters removed 22.7% of the SNPs, leaving 699,388 autosomal SNPs for further analysis. [...] In order to increase the number of SNPs used in our GCTA and GWA analyses, imputation was carried out using the IMPUTE version 2 software on the genotype data after application of quality control procedures, using a two-stage approach with both a haploid reference panel and a diploid reference panel. For the haploid reference panel we used HapMap phase II and III SNP data on the 120 unrelated CEU trios. 5,175 WTCCC2 controls were genotyped on both Affymetrix 6.0 and Illumina Human1.2M-Duo arrays (Illumina Inc., La Jolla, CA), and these were used for the diploid reference panel. Imputed SNPs were retained for analysis if they were genotyped using the Affymetrix 6.0 array, if they were genotyped using the Illumina Human1.2M-Duo array and obtained an information score ≥0.90, or if they were imputed and obtained an information score ≥0.98. Using these criteria, 1,024,929 imputed SNPs were retained for the GCTA and GWA analyses, in addition to the 699,388 measured SNPs described above. [...] MZ and DZ twin intraclass correlations were calculated and standard twin model-fitting was used to estimate additive genetic (A), common or shared environment (C), and residual or non-shared environment (E) . Although twin model-fitting is usually referred to as ACE models, in fact the twin design – unlike GCTA, discussed in the next section – can include non-additive as well as additive genetic effects. In quantitative genetics, estimates of heritability that include non-additive as well as additive genetic effects is called broad heritability, in contrast to narrow heritability, which is limited to additive genetic effects. In twin analysis, the additive genetic model assumes that DZ twins are half as similar as MZ twins because the genetic relatedness of DZ twins is 50% for additive genetic effects, whereas the relatedness of MZ twins is 100%. This twofold greater genetic resemblance of MZ as compared to DZ twins is the reason why heritability is often estimated by doubling the difference between MZ and DZ correlations. (For example, MZ and DZ correlations of 0.80 and 0.40, respectively, imply 80% heritability).In contrast, the hallmark of non-additive genetic effects is that the DZ correlation is less than half the MZ correlation because epistatic (inter-locus) gene-gene interactions scarcely contribute to DZ similarity but are shared entirely by MZ twins. If non-additive genetic effects are important, the twin method will detect these effects, although its ability to estimate these effects is limited. For example, if MZ and DZ twin correlations are 0.80 vs. 0.20, respectively, simply doubling the difference between MZ and DZ correlations – an additive genetic model which would be inappropriate given the non-additive pattern of twin correlations – would yield a heritability estimate of 120%. However, heritability cannot exceed the MZ twin correlation, so that the heritability estimate in this example would be constrained to be 80%. Model-fitting would show that the ACE model does not fit the data in this example. An allowance is made for non-additivity in a twin model called ADE, in which ‘D’ refers to dominance (intra-locus allele-allele interaction). In the ADE model, dominance discounts DZ resemblance from 50% for the A parameter to 25% for the D parameter. However, this adjustment does not cover the extreme epistatic case in which the DZ correlation could be zero despite a high MZ correlation. However, even in this extreme case – for example an MZ correlation of 0.80 and a DZ correlation of 0.00 – twin model-fitting would detect genetic influence and would cap the heritability estimate at 80%, as suggested by the MZ correlation of 0.80.In summary, the twin design can detect the presence of non-additive genetic effects, although it is limited in its ability to distinguish additive and non-additive genetic effects. Greater detail about distinguishing additive and non-additive genetic variance in twin designs is available (e.g. ). As is usual in twin analyses, residualized scores were used that were independent of age and sex because age and sex are perfectly correlated across pairs, which would be misinterpreted as C in twin analyses. The OpenMx package for R was used for twin maximum-likelihood model-fitting using full-information matrices . [...] Genome-wide association (GWA) analysis was conducted using a linear regression approach implemented in SNPTEST v2.0 under an additive model. This approach uses a frequentist method to account for uncertainty of genotype information . Because even small differences in allelic frequency within subgroups in the population can generate false-positive results, eight principal components representing population ancestry were used to control for population stratification. Sex and DNA sample plate number were also included as covariates. Results were visualized using Manhattan plots, quantile-quantile (Q-Q) plots, and genotype-phenotype plots, generated in R ; a regional association plot created using LocusZoom .Following SNP quality control and SNP imputations, described earlier, we performed several preliminary analyses prior to GWA analysis. First, principal component analysis (PCA) was used to attenuate GWA biases due to population structure. PCA was conducted on a subset of 105,556 autosomal SNPs post QC, selected after pruning to remove SNPs in high linkage disequilibrium (r2>0.2) and to exclude high linkage disequilibrium genomic regions so as to ensure that only genome-wide effects were detected . Application of the Tracy-Widom test indicated that eight principal components were significant using a threshold of p<0.05. We caution that the inclusion of principal components as covariates may not be sufficient to remove biases in estimation due to population structure . Our second preliminary analysis involved normalizing CU trait scores by transforming the ranked data to the quantiles of a standard Normal distribution using the van der Waerden transformation , and taking the residuals after regressing the resulting score on age at measurement.For the GWA analysis, each autosomal SNP was tested for association with CU at 7, 9, and 12 using a multivariate method that was similar in its essentials to the test of a longitudinal composite but required slightly less restrictive statistical assumptions. Using a linear regression framework, we calculated score statistics to test a hypothesis that the SNP had an equal effect on CU at each age. The test evaluated a single parameter and hence had 1 degree of freedom. Sex, birth year/school year cohort, and the first eight principal components of the genotype data were included as covariates in the regression model. The statistical framework was able to account for missingness in the outcome variables as a function of these covariates, assuming that data for these covariates were missing at random . The test was implemented as a custom library for R (http://www.cran.r-project.org).Probability values were adjusted by genomic control separately for genotyped SNPs, SNPs that were genotyped in the WTCCC2 controls and imputed in the TEDS sample, and SNPs that were imputed in both WTCCC2 controls and the TEDS sample. […]

Pipeline specifications

Software tools GCTA, SNPTEST, Evoker, IMPUTE, OpenMx, LocusZoom
Applications Miscellaneous, GWAS