Computational protocol: Genome-Wide Association Study in Bipolar Patients Stratified by Co-Morbidity

Similar protocols

Protocol publication

[…] Genotyping was carried out by the Broad Institute Center for Genotyping and Analysis using the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA) separately for cases and controls to facilitate sharing of controls with the Genome-Wide Association Study of Schizophrenia (dbGaP; Study Accession: phs000021.v2.p1). The identity of the samples including gender identity had been checked at Rutgers University by genotyping a 24-SNP panel on the Sequenom iPLEX platform. A detailed description of the extensive quality control of the data before public release can be found elsewhere . Even after those quality control measures, significant batch effects due to non-randomization of cases and controls were present in the publicly available data.Therefore, in this analysis, we recalled the genotypes on the raw signal intensity measurements of the data using the corrected robust linear model with the maximum likelihood distance (CRLMM) algorithm and the BEAGLECALL algorithm , . CRLMM used the Oligo 1.6.0 package from Bioconductor 2.3. (http://www.bioconductor.org /download/oldrelease/BioC2.3/). All samples were run together in a single computational batch using default parameters. Recently the BEAGLECALL methodology was introduced to improve the accuracy of genotype calls through the use of haplotype phase information and linkage disequilibrium (LD) structure . We used BEAGLECALL version 0.9.4, which invokes the haplotype phasing methods of BEAGLE version 3.1. . BEAGLECALL requires a matrix of genotype probabilities for the 3 genotypes as well as normalized A and B allele intensities. CRLMM calls were used as a starting point for BEAGLECALL – all calls were made regardless of confidence, and the initial probabilities for each call were set to one, with the other two genotype probabilities set to zero. Autosomal allele intensities were extracted from the raw CEL files and quantile normalized using Golden Helix SNP & Variation Suite (SVS) software (http://www.goldenhelix.com). Three iterations of BEAGLECALL were run with the recommended call thresholds of 0.8 for the first iteration, 0.96 for the second and 0.97 for the third. We used the default BEAGLECALL Hardy-Weinberg quality control threshold of 10−6. Genotype calls with confidence less than 0.97 were set as missing. Our analysis was limited to autosomal chromosomes only, since the BEAGLECALL software currently does not support genotype calls for X and Y chromosomal markers in male individuals. Starting with 868,157 autosomal calls from CRLMM, BEAGLECALL provided 796,664 high quality autosomal SNP calls. After eliminating SNPs with minor allele frequency (MAF)<0.001, the final dataset consisted of 728,331 autosomal SNPs. When association studies were performed across all combined samples with the case/control status, the Q-Q plots and Manhattan plots indicated little evidence of spurious associations due to batch effects (). A total of 1,000 cases and 1,034 controls had genotypes and phenotypes that met quality control standards and those were included in the genetic analysis using the latent class membership probability as phenotype. We tested for association under the dominant, additive and recessive model. Association analyses were performed with the SVS software from Golden Helix. Correction for population stratification and outliers in the genotype data was performed with principal component analysis . Q-Q plots were examined for each analysis () and cluster plots were manually examined for all significant findings. Genome-wide significant results were retested with 1,000 permutations. In addition, we re-genotyped all genome-wide significant SNPs with the TaqMan® SNP Genotyping Assay from Applied Biosystems run on the ABI 7900 Fast Real-Time PCR System (Life Technologies Corporation, Carlsbad, CA, USA) according to the published protocol (http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_042998.pdf). The genome browser used for the bioinformatics analysis was the Ensemble database, assembly GRCh37.p2, Feb 2009, Version 60.37e. […]

Pipeline specifications

Software tools CRLMM, BEAGLE, SVS
Databases dbGaP
Applications GWAS, Genome data visualization
Organisms Homo sapiens
Diseases Alcoholism, Psychoses, Substance-Induced, Genetic Diseases, Inborn