Computational protocol: Genetic variation in TLR or NFkappaB pathways and the risk of breast cancer: a case-control study

Similar protocols

Protocol publication

[…] As part of a study of breast cancer and inflammation, we examined 1,536 SNPs in pro- or anti-inflammatory genes. For this study, we selected a total of 233 SNPs from 31 genes in TLR or NFκB signaling pathways. The following genes were included: AZI2, IFIH1, IKBKE, IRAK4, IRF3, MAP3K1, MAP3K7, MMP9, NFKB1, NFKB2, RELA, RELB, TANK, TBK1, TICAM1, TICAM2, TIRAP, TLR3, TLR4, TLR7, TLR9, TNF, TNFRSF1A, TNFRSF1B, TOLLIP, TRAF3, TRAF6, UBE2C, UBE3A, VISA, and ZBP1. Using the software SNAGGER [] on publicly available HapMap and SeattleSNPs data, tagSNPs were selected among Caucasians based on an r2 value of at least 0.80 and a minor allele frequency (MAF) of 0.05. The tagSNPs were chosen from regions representing the candidate genes plus 4,000 base pairs both 3′ and 5′ of the gene. SNP selection was prioritized based on functional importance, giving SNPs in coding regions priority over those in other regions. To ensure that at least one SNP from each bin would be successfully genotyped, more than one tagSNP was chosen where a bin included more than 10 SNPs. Additionally, coding SNPs within candidate genes with a MAF of at least 0.02 and also SNPs found to be associated with cancer risk in previous studies were included in the panel. For example, rs889312 in the region surrounding MAP3K1 was selected for analysis based on its significance in prior genome-wide association studies (GWAS) [,]. [...] To account for potential confounding due to population stratification, we used principal components analysis to restrict our sample to 1,652 white women []. Briefly, principal components were computed from 872 controls after standardizing the 1,349 SNPs that passed our quality control checks according to the method outlined by Price et al. [] The first principal component was sufficient to distinguish white from non-white women. Principal components were computed for the entire sample of 1,759 cases and controls after standardizing the 1,349 SNPs to the control population. We determined clusters of white and non-white subjects using the same restriction criteria from the control population. The final study sample consisted of 1,652 individuals that clustered with white women and self-reported their race as white or Hispanic.Using these 845 cases and 807 controls, the relative risk of breast cancer associated with each SNP was approximated using logistic regression to compute odds ratios (OR) and 95% confidence intervals (CI). All models were adjusted for continuous linear age at reference and were log-additive. However, dominant models were fit when genotype cell counts were less than 5 for either cases or controls. We adjusted for multiple comparisons within a gene by using a minP permutation test with 10,000 replications to assess the significance of each gene []. For genes found to be significant (P ≤ 0.05) based on the minP permutation test, we used logistic regression to examine the association between SNPs and the risk of ductal histology (N = 565) and luminal breast cancer (N = 744) subtype compared to all controls. These models were adjusted for continuous linear age at reference and were log-additive.The gene set ridge regression in association studies (GRASS) algorithm was used to conduct exploratory pathway analyses for genes in TLR or NFκB pathways []. We examined the association between breast cancer risk and two pathways for genes in our dataset by selecting genes from the Kyoto Encyclopedia of Genes and Genomes (KEGG) “Toll-like receptor signaling pathway” (http://www.genome.jp/kegg/pathway/hsa/hsa04620.html). The first pathway included TLR3, TLR4, TLR7, TLR9, TIRAP, TICAM1, TICAM2, TOLLIP, IRAK4, TRAF3, TRAF6, MAP3K7, IRF3, and IKBKE. The second pathway included these genes in addition to NFKB1, NFKB2, RELA, and RELB. Prior to running any models with GRASS, we imputed any missing SNP values. All imputation was performed using BEAGLE 3.3 with a reference panel of phased genotype data from 283 European individuals sequenced by the 1000 Genomes Project []. Pathways were determined as significant based on a permutation test with 10,000 replications.Finally, we used publicly available data from the Cancer Genetics Markers of Susceptibility (CGEMS) Breast Cancer Genome-Wide Association Scan to validate our significant findings []. A Holm multiple test procedure was used to compute permutation corrected p-values with 10,000 replications for individual SNPs within significant genes in our data []. For SNPs found to be significant (Holm P ≤ 0.05), the risk of breast cancer associated with each SNP was computed using logistic regression in the CGEMS data, after adjusting for age in 5-year groups. BEAGLE was used to impute seven SNPs that were not already present within the CGEMS data using phased genotype data from the 1000 Genomes Project as a reference panel. Six SNPs with successful imputation (r2 > 0.90) were used for analysis.All analyses were performed using Stata 11 or R version 2.10.1. […]

Pipeline specifications

Software tools SNPinfo, GRASS
Databases KEGG
Application GWAS
Diseases Breast Neoplasms, Neoplasms