Computational protocol: Functional characterization of a multi-cancer risk locus on chr5p15.33 reveals regulation of TERT by ZNF148

Similar protocols

Protocol publication

[…] Imputation across 2 Mb of chr5p15.33 (250,000 to 2,250,000 bps, hg19) was performed using phased haplotypes from the 1000G reference set (Phase 1 integrated release 3, March 2012) and IMPUTE2 for pancreatic cancer and testicular germ cell tumours. Imputed SNPs with low MAF (<0.01) or low-quality scores (IMPUTE2 information score <0.5) were removed before the association analysis. Association analysis between SNPs and case control status were performed using the score test of the log additive genetic effect with covariate adjustment using SNPTEST as previously described. Imputation and association analysis for melanoma was performed using 1000G (Phase 1 integrated release 3, March 2012) as previously described. Imputation for lung cancer was performed by using 1000G (Phase 1 integrated release 3, March 2012) with the same quality thresholds as described, followed by association analysis and conditional analysis using summary statistics from a meta-analysis of the six studies of TRICL with GCTA.Overall, Region 2 was well-imputed. Within the pancreatic cancer GWAS data, all common 1000G variants (n=195, MAF≥0.01) in Region 2 (defined as the genomic region between the two recombination hotspots at 1,306,281–1,367,281 in NCBI build Hg19) had imputation accuracy (INFO) scores above 0.3 (the lowest quality score was 0.48). The imputation quality for the set of nine Region 2 variants most significantly associated with pancreatic cancer risk was high in the PanScan GWAS studies, with quality scores (INFO) ranging from 0.82 to 0.96 (average 0.92). Similar imputation quality scores were observed for these SNPs in the lung cancer, TGCT, and melanoma GWAS (INFO range 0.82 to 0.98; average 0.94). In addition, imputation quality was high for all SNPs that were statistically correlated with rs36115365 in 1000 Genomes CEU data (r2>0.2). In PanScan, only a single such 1,000 Genomes variant had an imputation quality score (INFO) below 0.8 (rs186156459; INFO=0.79), suggesting that poor imputation quality did not lead to the exclusion of additional strong functional candidates from consideration. Similar imputation quality was likewise observed for the other cancer GWAS.For completeness we assessed the newer 1000G (Phase 3, October 2014) reference dataset and noted an insertion/deletion variant (rs3030832) that was highly correlated to rs36115365 (r2=0.87 in EUR). We therefore re-imputed the pancreatic cancer GWAS dataset with the newer 1000G reference set to re-assess the association signal across Region 2 (defined as the genomic region between the two recombination hotspots at 1,306,281–1,367,281 in NCBI build Hg19) including this variant. rs36115365 became non-significant when analysis was conditioned on rs3030832, as was rs3030832, when analysis was conditioned on rs36115365 (), indicating that this variant is among the highly correlated variants representing Region 2 and thus represents an additional strong functional candidate. We also observed seven additional variants with similar or slightly higher ORs as compared to rs36115365 (ORMAX=1.42). To formally test if these seven variants represented potential functional variants in Region 2 we performed a series of conditional analyses. After the analysis was conditioned on rs36115365 we noted a large drop in significance for these seven variants while conditional analysis for each of the seven variants did not dramatically influence the significance or rs36115365 (). […]

Pipeline specifications

Software tools IMPUTE, SNPTEST, GCTA
Application GWAS
Diseases Lung Neoplasms, Melanoma, Neoplasms
Chemicals Zinc