Computational protocol: Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels

Similar protocols

Protocol publication

[…] Tag SNPs were by chosen from HapMap data using an algorithm incorporating the LD statistic r 2 []. The genome was divided into 1-Mb nonoverlapping segments, and pairwise r 2 values were calculated for loci within 200 kb. Approximately 314,000 SNP loci were first selected from the CEU population from the Phase I HapMap data (release 16c). HapMap release 16c had approximately 775,000 SNP loci with MAF ≥ 0.05 in the CEU population. First, tag SNPs were chosen using a strict r 2 threshold of 0.8. If any SNP in a bin of correlated SNPs was within 10 kb of a RefSeq gene or in an evolutionarily conserved region (ECR), the tag SNP was retained as a “must-keep” SNP. A SNP was defined as being in an ECR if the SNP mapped to one of the phastCons elements with a PHAST score ≥ 50. A second analysis was done using a less-stringent r 2 threshold of 0.7, choosing additional tag SNPs genome-wide in addition to the “must-keep” tag SNPs selected from the previous analysis. This strategy provided a higher density of tag SNPs within 10 kb of genes or in ECRs.To construct the HumanHap550, an additional ∼240,000 tag SNPs were selected from the Phase II HapMap data (release 20) and combined with 313,505 HumanHap300 loci. Using the HumanHap300 tag SNP list as “must-haves,” an analysis was conducted using the full release 20 data in the CEU population (>2,100,000 SNPs with MAF ≥ 0.05), prioritizing tag SNP selection for those loci that were polymorphic in all HapMap populations. Again, SNP selection in the CEU population was done choosing a higher density of tag SNPs within 10 kb of RefSeq genes and in ECRs (r 2 = 0.8 in gene regions/ECRs; r 2 = 0.7 in rest of the genome). All tag SNPs were retained with the exception of singleton bins (those SNPs not tagging any additional SNPs) not within 10 kb of a gene or in an ECR. An additional tag SNP was selected for those bins with 10 or more loci. After the core set of tag SNPs were determined in the CEU population, additional tag SNPs were included from the Han Chinese/Japanese (CHB + JPT; all bins >2 SNPs at r 2 = 0.8) and Yoruba populations (YRI; all bins >4 SNPs at r 2 = 0.7), respectively. Additional content was added to the panel including 7,779 nsSNPs, 177 mitochondrial SNPs (selected from http://www.broad.harvard.edu/mpg/tagger/mito.html []), 4,284 SNPs in 495 reported copy number regions of the genome [–], and a higher density of tag SNPs in the MHC region. After this final list was selected, any gaps ≥ 100 kb between common SNPs for each population were filled with common SNPs for that particular population. The mean spacing between consecutive common SNPs on autosomal chromosomes is 5.5 kb, 6.5 kb, and 6.3 kb for CEU, CHB + JPT, and YRI, respectively.To construct the HumanHap650Y, 100,000 additional YRI-specific tag SNPs were added to the 555,532 previously selected SNPs. Using the 555,532 tag SNPs list as “must-haves,” an analysis was conducted using the release 20 data in theYRI population and tag SNPs from the largest bins were selected (bins >2 SNPs, r 2=0.7). The mean spacing between consecutive common SNPs on autosomal chromosomes is 5.3 kb, 6.2 kb, and 5.4 kb across the genome in the CEU, CHB + JPT, and YRI populations, respectively.To calculate coverage of HapMap or SeattleSNPs, pairwise r 2 values were calculated using the expectation algorithm [] based on the genotypes from HapMap release 21 and the 68 genes resequenced in the PGA samples. Maximum r 2 values were calculated for each SNP list (HapMap release 20 or 68 PGA genes) with a SNP on either HumanHap550 or HumanHap650Y. All pairwise combinations were considered within 200 kb. For chrX, only female individuals were used; otherwise, all unrelated individuals were used. […]

Pipeline specifications

Software tools PHAST, Tagger
Application GWAS