Computational protocol: Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis

Similar protocols

Protocol publication

[…] For the analysis of the brain biopsy sequencing data, we implemented the following steps. We first removed duplicate sequences that can arise from PCR amplification with an in-house script that collapses pairs of reads based on sequence identity using 90 % of the sequence as signature (20 % removed as duplicates). Half of the reads overlapped with their “mates” within pairs and we therefore merged the overlapping reads using PEAR [], taking into account both sequence match and quality scores. We performed quality control using PrinSeq [], trimming low-quality ends and removing reads that had average quality less than 15. We subsequently removed human sequences, using a quick aligner (Novoalign version V2.07.13—human reference genome GRCh37) as well as BLASTn []. We performed de novo assembly of high-quality contigs using Velvet [] (kmer = 81). Finally, we annotated the contigs and the unassembled reads against a custom protein reference database using BLASTx. Our custom protein reference database consists of viral, bacterial, human and mouse RefSeq proteins. More specifically, all known viruses in the RefSeq collection are used ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz, as well as all the bacteria of the human microbiome, according to ftp://ftp.ncbi.nih.gov/genomes/HUMAN_MICROBIOM/Bacteria/all.faa.tar.gz. The BLASTx results were the input of metaMix [].For the analysis of the vaccine sequencing, we first removed 15 % of the reads as duplicates and merged overlapping reads. We trimmed the reads based on base quality (q = 20) using Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). We then selected for Jeryl Lynn vaccine strain reads using BLASTn [] and a Jeryl Lynn nucleotide reference database.We performed de novo assembly with SPADeS [] followed by QUAST [] for both the vaccine and the brain sequencing data. In the latter case, we used the MuV reads as identified by metaMix []. Novoalign, Samtools [] and VarScan2 [] were used for consensus sequence generation and variant calling. We filtered variants based on quality, depth, frequency and strand bias (quality ≥30, at least 5 reads for the variant site, frequency ≥5 %, p value <0.01). The variants were annotated with SnpEff [].We compared the number of non-synonymous changes observed in each of the MuV genes to the number we would expect if the observed missense mutations were randomly distributed across the genome, correcting for the gene length. Significant deviation from the expected number of mutations was tested with the goodness-of-fit two-tailed exact binomial test. Analysis was conducted with the statistical language R (http://www.r-project.org).We estimated a maximum likelihood phylogenetic tree using RAxML [] and 64 full MuV genomes from GenBank (accessed on 20 June 2015). […]

Pipeline specifications

Software tools PEAR, PRINSEQ, NovoAlign, BLASTN, Velvet, BLASTX, metaMix, Trim Galore!, SPAdes, QUAST, SAMtools, VarScan, SnpEff, RAxML
Applications Phylogenetics, De novo sequencing analysis
Organisms , Homo sapiens
Diseases Brain Diseases, Encephalitis, Immunologic Deficiency Syndromes, Nervous System Diseases, Rubella, Subacute Sclerosing Panencephalitis, Severe Combined Immunodeficiency