Computational protocol: Single Nucleotide Polymorphisms in the Wnt and BMP Pathways and Colorectal Cancer Risk in a Spanish Cohort

Similar protocols

Protocol publication

[…] SNP selection criteria only considered functional markers with minor allele frequencies above 0.05 and at least two independent validation criteria as established in dbSNP . This included all exonic variants selected with Pupasuite and gene-regulatory regions in cis (5′or 3′ UTR ends), as defined by the FESD web browser . 5′UTR variants were only included when they complied to the abovementioned criteria and were presumed to be in the potential binding site of a known transctiptional binding factor. 3′ UTR variants were included because of their potential relationship with miRNA binding regions . Because some of the selected genes had no SNPs of such these kinds in any of the three browsers at the time of SNP selection, they ultimately had to be dropped out of the study. Finally, 43 SNPs were chosen within 21 genes to be screened as potential direct modifiers of CRC susceptibility ().rs4444235 and rs9929218 are two variants lying in the near-by and intronic regions of BMP4 and CDH1, respectively, that have been recently reported to be associated with the disease . Considering that the SNPs that we had chosen within these two genes were not good taggers for these two variants (r-squared values were 0.6 for the SNPs in BMP4, and 0.02 for those in CHD1) (), we decided to include them in our study as well, although they did not fulfill our selection criteria, making the total number of interrogated SNPs rise to 45.Genotyping was performed with the MassARRAY (Sequenom Inc., San Diego, USA) technology at the Santiago de Compostela node of the Spanish Genotyping Center. Calling of genotypes was done with Sequenom Typer v4.0 software using all the data from the study simultaneously. [...] Quality control was performed, first by excluding both SNPs and samples with genotype success rates below 95%, with the help of the Genotyping Data Filter (GDF) . Genotypic distributions for all SNPs in controls were consistent with Hardy-Weinberg equilibrium as assessed using a X2 test (1df). All p-values obtained were ≥0.05, thereby excluding the possibility of genotyping artifacts (data not shown). Population stratification was assessed with Structure v2.2 . Briefly, the posibility of different scenarios was tested assuming a different number of underlying populations (k ranging from 1 to 4), allowing for a large number of iterations (25 K in the burn-in period followed by 500 K repetitions). The mean log likelihood was estimated for the data for a given k (referred to as L(K)) in each run. We as well performed multiple runs for each value of k computing the overall mean L(K) and its standard deviation. All results seemed to be concordant with the original assumption of a single existing population. Moreover, additional procedures for better confounding variable visualization were undertaken by means of a Principal Component Analysis (PCA) using the EIGENSOFT tool smartpca , although number of markers was very low. No differences were found of population stratification between cases and controls for either STRUCTURE or the first 10 components of the PCA analysis (). After quality control 1746 samples (854 cases and 892 controls) and 37 SNPs remained for further analyses.Association tests were performed by chi-squared tests for every single SNP and haplotypes where possible with both Haploview v4.0 and Unphased . In short, LD patterns across genes for which more than one SNP was genotyped were checked in Haploview and tested for association using Unphased (to check in any of the haplotypes was associated) and Haploview (to see which of the haplotypes was associated). Genotypic association tests, logistic regression analysis for sex and age adjustment, and stratified analysis between sporadic and familial groups were estimated with PLINK v1.03 . OR and 95% confidence intervals were calculated for each statistic, and to address the issue of multiple-testing, permutation tests and the Bonferroni correction were used. Study power was estimated with CATS software . […]

Pipeline specifications

Software tools PupaSuite, EIGENSOFT, Haploview, PLINK
Databases dbSNP FESD
Applications Population genetic analysis, GWAS
Diseases Colorectal Neoplasms