Computational protocol: Dissecting Allele Architecture of Early Onset IBD Using High-Density Genotyping

Similar protocols

Protocol publication

[…] All chip images were merged into a single batch for simultaneous genotype calling with BeadStudio. Included in these samples were 133 replicates (the same sample run twice), and 33 parent-offspring pairs. These replicates and family members were used to directly test for genotyping error []. Samples were tested for cryptic (unexpected) relatedness, incorrect genders, overall data completeness, and overall heterozygosity; samples were excluded if they had less than 90% data completeness, differed by more than three standard deviations from the mean heterozygosity for the study, had the wrong gender, or were unexpectedly the first-degree relative of any other sample in the study. Approximately 15% of our samples were dropped for one or more of these reasons.BeadStudio reported genotypes for 189,012 autosomal SNPs. Any SNP with more than 2% missing data, only one allele present (i.e. not segregating), any detectable genotype error (either through duplicates or parent-offspring), or a Hardy-Weinberg p-value less than 10–5 in controls was dropped. This resulted in a final dataset of approximately 140,000 autosomal SNPs. Over 40% of the dropped SNPs were dropped simply because there was only one allele present (~20,600 SNPs).For this study, we performed three main association analyses. We first evaluated patients with a diagnosis of CD with matched controls. We then evaluated all remaining cases (IBDminusCD), which consisted of cases with UC or other indeterminate IBD diagnosis (but not CD). Finally, we performed association analyses using very early onset (VEO) CD or UC /IBD-U labeled as “cases” contrasted with matched early onset (EO) CD or UC / IBD-U labeled as “controls”. Matching of cases and controls was done by determining principal components (PC) with Eigenstrat [, ], plotting PC1 against PC2, followed by visual inspection and elimination of outlier samples. Four successive rounds were necessary until a satisfactory matching of CD cases/controls () and UC (IBDminusCD) cases/controls () was obtained.After all SNP and sample removal, 1,633 controls, 801 CD, and 207 UC / IBD-U individuals remained. From the 801 CD cases, we performed a second association with 267 VEO CD “cases” contrasted with 525 EO CD “controls”. From the 207 UC cases, there were 62 VEO UC / IBD-U “cases” contrasted with 143 EO UC / IBD-U “controls”. We excluded 7 CD and 2 UC samples because of indeterminate age of diagnosis. All samples were unrelated to one another. All association analysis was performed with PLINK 1.0.7 via a logistic regression [], additive model, adjusting for the first five principal components of ancestry as determined by Eigenstrat. Replication of the top SNPs from Jostins et al. successfully genotyped in this study was assessed by performing a Bonferroni correction for 158 tests (0.05/158), resulting in a threshold of 0.0003. The complete summary for all SNP associations for CD () and UC () are contained in the supplemental materials. The summary for the 158 top SNPs for CD () and UC () are also included in the supplemental materials.Polygenic liability scores were calculated assuming an underlying normal distribution of liability with disease (CD or UC) state representing a threshold on the continuous liability scale []. To do so, we first assumed the odds ratio estimated by Jostins’ et al [] is the true odds ratio for the identified allele. Using the observed allele frequency in controls from this study, and an assumed prevalence for CD of 5 in 10,000, and for UC of 1 in 10,000, independent of sex, the additive effect on liability of each of the (163) Jostins’ identified SNPs was calculated []. Final polygenic liability for each sample was calculated by summing the additive affects for both alleles at all 163 loci. Thus we assume both additive dominance and additive epistasis on the liability scale. This polygenic liability score was regressed against age of onset for all CD and UC cases separately. While our cohort is purely pediatric onset (less than 18 years) only about 10% of the Jostins et al cohort were under 18 years of age at onset (personal communication with Dr Judy Cho).The data of this study have been deposited into the Odum Institue Dataverse Network hosted at UNC (http://arc.irss.unc.edu/dvn/) and is accessible through the accession number doi:10.15139/S3/11991. […]

Pipeline specifications

Software tools PLINK, Dataverse
Applications Genome annotation, GWAS
Diseases Colitis, Ulcerative, Crohn Disease, Inflammatory Bowel Diseases, Genetic Diseases, Inborn