Computational protocol: Whole exome sequencing unveils a frameshift mutation in CNGB3 for cone dystrophy

Similar protocols

Protocol publication

[…] The north Indian Muslim family (Fig. A) residing in Jaunpur, Uttar Pradesh, India was recruited in our ongoing genetic study on RD approved by the ethical committee of Faculty of Science (currently known as Institute of Science), Banaras Hindu University, Varanasi, India. Complete ophthalmologic evaluation, including family history, visual function testing, and retinal imaging, was carried out. The patients were referred to All India Institute of Medical Sciences (AIIMS), New Delhi, India for electroretinography (ERG) examination. After procuring written informed consent, peripheral blood samples were taken from 8 members including both parents (II.5 and II.6) and 6 siblings (III.1 to III.6). We used standard salting out protocol to extract genomic DNA from peripheral blood. Initially, 3 family members including mother (II.6) and 2 affected sisters (III.1 and III.5) were chosen for the genetic investigation by whole exome sequencing. Exome enrichment was carried out with 3 μg of genomic DNA using Agilent SureSelect Human All Exon V5 Kit (Agilent Technologies, Santa Clara, CA) according to manufacturer's protocol on the Illumina HiSeq 2000/2500 platform (Illumina, San Diego, CA). Each sample was sequenced to at least 100X raw target depth, with 100 bp paired-end sequencing method.During bioinformatics analysis, we checked several parameters from fastq files, including base quality score distribution, sequence quality score distribution, average base content per read, GC distribution in the reads, PCR amplification issue, over-represented sequences, and adapter trimming. On the basis of the quality of fastq files, sequence reads were trimmed where necessary to retain only high-quality sequence for further analysis. In addition, low-quality sequence reads were excluded from further investigation. Adapter trimming was performed using fastq-mcf program (version - 1.04.676). Paired-end reads were aligned to the reference human genome primary assembly hg19 downloaded from UCSC database. Alignment was performed using BWA program (version - bwa-0.7.8). While running BWA, trimming was performed using –q = 20. Reads after alignment with mapping quality ≥20 and with insert size ≥100 bp and ≤1000 bp were taken further for analysis. Paired reads that mapped to 2 different chromosomes were totally discarded from the analysis. Aligned reads were first sorted by Picard tool (picard-tools - 1.115) using SortSam command, and read duplicates were removed using Picard Mark Duplicates command. After removing the duplicates, reads were realigned around the known indels provided by GATK group. This was followed by base recalibration step. After recalibration, quality score of each base was more accurate. Known variant positions were taken into account to recalibrate the quality score. After realignment, we used GenomeAnalysisTKLite-2.3–9 toolkit Unified Genotyper (https://www.broadinstitute.org/) to identify single nucleotide variants (SNVs) and short Indels. We further filtered variants in order to retain good quality (depth and variant score) variants. Identified variants were annotated using in-house program (VariMAT). The gene model used for annotation was downloaded from Ensembl database Release 75. Variant class prediction was done using VeP Release 75 (http://www.ensembl.org) and internal annotation pipeline (VariMAT). [...] In order to understand the possible consequences of the frameshift deletion on the 3-dimensional structure of the protein, we generated protein model of the wild-type and mutant CNGB3 protein. Because of the low sequence identity between CNGB3 and template PDBID: 5K7L (20%), we used secondary structure-based threading server PHYRE[] to generate the CNGB3 model, taking the Eag1 crystal structure (PDB ID: 5K7L chain A) as a template. All sequence similarity searches were carried out in the MPI Bioinformatics Toolkit using HHpred[] with default settings. HHpred searches for template were performed against a database comprising PDB70 (protein databank structures, as available on January 19, 2017) clustered at 70% sequence identity. Chimera tool was used for visualization and analysis of the modeled protein structure.[] […]

Pipeline specifications

Software tools Phyre, MPI Bioinformatics Toolkit, HHPred
Application Protein structure analysis
Chemicals Nucleotides