Computational protocol: Group Selection and Contribution of Minority Variants during Virus Adaptation Determines Virus Fitness and Phenotype

Similar protocols

Protocol publication

[…] 5x108 virion from passaged samples were RNA extracted and RT-PCR amplified by RT (Superscript III) and PCR (Phusion) using primers sets that covered the whole genome, in 3–4 kb fragments. For consensus sequencing, the resulting PCR products were purified, sequenced and analyzed using Lasergene software (DNAStar Inc). For deep sequencing, PCR fragments were purified via the Nucleospin Gel and PCR Clean-up kit (Macherey-Nagel) and total DNA was quantified by Nano-drop. PCR products were then fragmented (Fragmentase), linked to Illumina multiplex adapters, clusterized and sequenced with Illumina cBot and GAIIX technology. Sequences were demultiplexed by CASAVA with no mismatches permitted. Clipping was performed using the fastq-mcf tool, removing common adapter contaminants and trimming low quality bases (Phred<30). Clipped reads were aligned to the Coxsackie virus B3 Nancy sequence as reference with a maximum 2 mismatches per read, and no gaps, using BWA v0.5.9. Alignments were processed using SAMTools to obtain a pileup of the called bases at each position. An in-house pipeline, termed ViVAN (Viral Variant ANalysis) [] was used to identify statistically significant variants above the background noise due to sequencing error, in every sufficiently covered site (>100x). Briefly, for each position throughout the viral genome, base identity and their quality scores were gathered. Each variant was determined to be true using a generalized likelihood-ratio test (used to determine the total number of minority variants) and its allele rate was modified according to its covering read qualities based on a maximum likelihood estimation. Additionally, a confidence interval was calculated for each allele rate. In order to correct for multiple testing, Benjamini-Hochberg false-discovery rate of 5% was set. The total allele rates passing these criteria, across the whole genome, were used to calculate the mean variation rates (diversity) at different passages. The variation rate at position i is defined as the proportion (F) of significant non-reference alleles (k) and is denoted Vi: Vi=∑j=1kFij The region-wide variation rate is the averaged variation rate across all covered positions in the genome (denoted n): V=∑i=1nVin […]

Pipeline specifications

Software tools BaseSpace, ea-utils, BWA, SAMtools, ViVan
Application Genome annotation
Diseases Communicable Diseases, Emerging