Computational protocol: A burden of rare variants in BMPR2 and KCNK3 contributes to a risk of familial pulmonary arterial hypertension

Similar protocols

Protocol publication

[…] To understand comprehensive genetic background of these 9 PAH families, we applied whole exome- or genome-sequencing to 12 patients and 5 healthy family members whose DNA were available (Fig.  and Additional file : Table S1). For the exome sequencing, DNA fragments were enriched by SureSelectXT Human All Exon v4 + UTR (Agilent Technologies, Santa Clara, CA, USA) and then applied to SOLiD™ 5500XL sequencer (Thermo Fisher Scientific inc., Waltham, MA, USA). The whole genome sequencing was conducted with the Illumina HiSeq X sequencer (Illumina Inc., San Diego, CA, USA). After aligning the sequence reads onto the reference genome (NCBI Build 37) using the Burrows-Wheeler Aligner [], downstream processes including the duplication removal, the recalibration of base quality values, the local realignment, the variant call, and the variant quality score recalibration were analyzed using GATK []. The variants were called with an exome sequencing data set of 300 control samples obtained from the Human Genome Variation Database (accession ID: HGV0000004) []. The resulting VCF file has been deposited on the same database under accession HGV0000005. [...] After removing variants that were assigned as low quality by the GATK VariantRecalibrator, additional filters were applied to extract high quality variants such as low call rate (<0.9), excessive strand bias (FS > 50), haplotype score (> 5), deviation from Hardy-Weinberg equilibrium (InbreedingCoeff > 0.3), mapping quality of the reads (MQ < 35), excess of zero mapping quality (MQ0 > 100), bias of mapping quality between reference and alternative alleles (MQRankSum < 13), coverage over sample (DP/sample < 10), positional bias of the reads (ReadPosRankSum > 5), quality over depth (QD < 8), and low LOD score (VQSLOD < 0). For the gene-based association analysis, we selected likely protein damaging variants (premature termination, splice site, missense, and indels on exons) to perform the Variable Threshold (VT) test [] implemented in Variant Association Tools []. [...] All identified variants were annotated using ANNOVAR []. Candidate pathogenic variants were screened according to the registrations and frequencies of the variants in the public databases: dbSNP (Build 147) [], The 1000 Genomes (November 2010 data release) [], The 10Gen Data Set (version 1.04) [], NHLBI GO Exome Sequencing Project (ESP6500SI) [], the Human Genetic Variation Database [] or ClinVar []. For missense variants, PolyPhen-2 [] and Mutation Taster [], LRT [] and PhyloP [] score were obtained from the dbNSFP database []. Damaging effects of splice site variants were evaluated with MaxEntScan [] and Human Splicing Finder []. […]

Pipeline specifications

Software tools BWA, GATK, VAT, ANNOVAR, PolyPhen, PHAST, MaxEntScan, HSF
Databases dbNSFP ClinVar dbSNP HGVD HGVbase
Applications WGS analysis, WES analysis, GWAS
Organisms Homo sapiens
Chemicals Nucleotides, Potassium