Computational protocol: A comprehensive analysis of common genetic variation in prolactin (PRL) and PRL receptor (PRLR) genes in relation to plasma prolactin levels and breast cancer risk: the Multiethnic Cohort

Similar protocols

Protocol publication

[…] We sequenced the exons and splice-site regions of PRL and PRLR in germline DNA from 95 advanced breast cancer cases (19 of each racial/ethnic group). We used DNA samples from advanced cases to increase the probability of discovering single nucleotide polymorphisms (SNPs) that are biologically relevant to breast cancer. Sequencing was performed using ABI BigDye terminator chemistry on the ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA). The PolyPhred program was used to identify polymorphisms with manual review by at least two observers, and all putative coding variants were validated by genotyping in the same panel of advanced cases and in the multiethnic panel (discussed below). [...] We used a haplotype-based approach to study common variation in PRL and PRLR in the MEC, previously described elsewhere []. We selected single nucleotide polymorphisms (SNPs) from both the public (National Center of Biotechnology Information []) and private (Celera []) databases to construct high density SNP maps that included up to 20 kilobases (kb) upstream of the transcription initiation site and 10 kb downstream of the last exon of each gene, for a total coverage of 59 kb in PRL and 210 kb in PRLR. Block structure was assessed using SNPs with MAF ≥ 10%. Blocks were initially defined following alignment across racial/ethnic groups; borders were characterized by SNPs at the extreme ends of the block in any one ethnic group, except for African-Americans, whose block sizes, as expected, were modestly smaller than the other groups. We tested the suitability of this block definition by evaluating whether SNPs surrounding presumed block borders modified the number or identity of common haplotypes estimated within the blocks; changes in the number of haplotypes and the introduction of recombinant haplotypes would indicate whether SNPs were spanning a potentially important site of historical recombination and guided us in redefining a block boundary.We genotyped common SNPs (MAF > 5% in at least one racial/ethnic group) at a density of 1 SNP every ~1 kb on average across the locus, all known missense SNPs in public database, and all newly identified missense SNPs in our sequencing effort. In total, 139 (PRL) and 276 (PRLR) SNPs were selected and genotyped in a multiethnic panel of 349 women in the MEC without a history of cancer (n = 69–70 per racial-ethnic group). This sample size allows > 99% power to detect common haplotypes (≥ 5% frequency) that are shared across all ethnic groups, and about 90% power to detect common ethnic-specific haplotypes. Of these SNPs, 36 (PRL) and 74 (PRLR) were identified as monomorphic and 17 (PRL) and 22 (PRLR) genotyped poorly (SNPs missing genotype data for ≥ 25% of samples or out of Hardy-Weinberg equilibrium more than one of the populations, p ≤ 0.01). This left 80 (PRL) and 173 (PRLR) SNPs with MAF = 5% in at least one racial-ethnic group to be included in the haplotype analysis.The |D'| and r2 statistics were used to assess pairwise linkage disequilibrium (LD) between the common SNPs. Within regions of strong LD [], haplotype frequency estimates were constructed from the genotype data in the multiethnic panel (one ethnicity at a time) using the expectation-maximization (E-M) algorithm of Excoffier and Slatkin []. The squared correlation (Rh2) between the true haplotypes (h) and their estimates were then calculated as described by Stram et al.[]. "Tagging" SNPs (tagSNPs) for the case-control study were then chosen by finding the minimum set of SNPs for each ethnic group that would have Rh2 > 0.7 for all common haplotypes with an estimated frequency of ≥ 5%. TagSNP selection was performed using the tagSNPs program [].Values of the multi-marker and pairwise R2 values between tagSNPs and unmeasured SNPs were calculated using the Tagger algorithm [] in Haploview and the slightly more general method given in Stram 2004 []. […]

Pipeline specifications

Software tools PolyPhred, SNPinfo, Tagger, Haploview
Applications Sanger sequencing, GWAS
Organisms Homo sapiens
Diseases Breast Neoplasms, Metabolism, Inborn Errors