Computational protocol: Shared ancestral susceptibility to colorectal cancer and other nutrition related diseases

Similar protocols

Protocol publication

[…] The study focused on SNPs, for which ancestral alleles have previously been associated with nutrition-related complex diseases other than CRC, such as obesity, T2D and metabolic syndrome. Information about such SNPs was collected from 30 published reports by browsing the PubMed database ( [] for the keywords “diabetes”, “obesity”, “metabolic syndrome” (OMIM ID: 605552) and “hypertension” (OMIM ID: 145500) up to 06/2009. Most of the articles were based on genome-wide association studies or were meta-analyses. A complete list of the publications can be found at the reference list of the .From these 30 reports, associations with the risk of the diseases and with the related quantitative traits were retrieved. The quantitative traits for diabetes were fasting glucose level and insulin resistance. For obesity, the traits were body mass index (BMI) and waist to hip ratio. The quantitative traits for hypertension and the metabolic syndrome were high-density lipoprotein (HDL) level, low-density lipoprotein (LDL) level, triglycerides level, salt sensitivity, blood pressure and insulin resistance. A complete list of the reported associations can be found in the Additional file .The candidate SNP selection for the association study took place in three major phases (Figure ):(1) “Selection of Candidate SNPs”: All published SNPs were evaluated for the nature of the risk allele – either ancestral (A) or derived (D) - and the allele frequency differences between African, European and Asian populations (YRI: Sub-Saharan African population, Yoruba in Ibadan, Nigeria; CEU: Caucasian population, Utah residents with Northern and Western European ancestry from the CEPH collection; HCB: East Asian population, Han Chinese in Beijing, China; JPT: East Asian population, Japanese in Tokyo, Japan). An allele was considered a “risk allele” when it was associated with a significantly increased risk of a key-disease (OR>1; statistical significance based on the criteria of the original publication), or when it was associated with a significant increase of quantitative values in the original publication. The nature of the risk allele was determined by using the NCBI database ( [].The reported ancestral susceptibility SNPs that showed an absolute allele frequency difference of >45% between the African and any non-African population were chosen for further investigation. The threshold value of 45% was set to detect variants with a “major-to-minor” allele change between populations, thus indicating a possible influence of selective pressure. A second, lower threshold (25%) was set for the difference between the YRI and the CEU population to acknowledge the more recent separation of the European than the East Asian population from the African population.(2) “Candidate Gene Definition”: The SNPs that passed the first selection criteria were evaluated for their location in the genome, possible functional effects, linkage disequilibrium (LD) with other polymorphisms within the gene region and the number of candidate SNPs in the gene region.(3) “Tagging SNP Approach”: In addition to the evaluation of the reported ancestral susceptibility SNPs, a tagging SNP approach was carried out for each candidate gene or gene region using the genotyping data of the CEU population and HaploView© software []. Next to a minor allele frequency (MAF) of ≥5%, a tagging SNP had to feature the following parameters:– be or capture a phase 1 SNP and/or– be or capture a functional polymorphism and/or– capture a maximal number of SNPs within a candidate gene or gene region with >25% allele frequency difference between the YRI and the CEU populationIn the majority of cases, the reported ancestral susceptibility SNP itself was genotyped. When that was not possible (e.g. because the assay design failed due to the structure of the surrounding sequence) another SNP that was in LD with the reported SNP (r2>0.9) was selected for genotyping in order to indirectly gain information about the reported ancestral susceptibility SNP. For large, diverse genes/gene clusters, additional tagging SNPs were selected in order to gain more knowledge about the genes. These tagging SNPs should also fulfil the criterion of >25% allele frequency difference between the YRI and CEU population (Table ).Allele frequencies were obtained from the NCBI database for the submitter population IDs: HapMap -CEU, -YRI, -HCB and -JPT (NCBI dbSNP Build 130; []. LD/r2 was obtained from HapMap using the HapMap3 - and HapMart Genome Browser (Release #2, Phase 3; [,] and HaploView© software (Version 3, Release 2, Analyse Panel CEU) [] with the implemented Tagger tagSNP selection algorithm []. The data about possible functionality originate from the NCBI database ( [] and the PolyPhen ( [inter alia []. [...] Next to allele frequency differences between the African and the non-African populations, the study aimed to detect additional signatures of selection in those genes that were associated with CRC in the case–control study. Highly variable allele frequencies in different populations might be attributable to processes such as genetic drift, bottleneck events or founder effects that occur during the separation from the ancestral population. In order to encounter this problem, methods, which are less susceptible to demographic influences, were applied to investigate signatures of selective pressure. Instead of the traditional FST value and Tajiman D test, Fay-Wu’s H and the Standardized Integrated Haplotype Score (|iHS|) were estimated using the Haplotter web application that was developed on genome-wide HapMap data ( [,]. The Fay-Wu’s H algorithm detects unusual excess of high frequency derived alleles in a gene region. Strong negative Fay-Wu’s H values are considered as signatures for a selective sweep [,]. The iHS measures the length of haplotypes around a given SNP in comparison to the whole genome. Values < −1.5 and > 1.5 (|1.5|) give conclusive evidence for natural selection while values < −2 or > 2 (|2.0|) give evidence for a powerful selection signal [,]. Values were estimated for the YRI, the CEU and the East Asian (ANS) population. […]

Pipeline specifications

Software tools Haploview, Tagger, SNPinfo, PolyPhen, Haplotter
Databases OMIM
Applications Population genetic analysis, GWAS, Genome data visualization
Diseases Hypertension, Colorectal Neoplasms