Computational protocol: Virome and bacteriome characterization of children with pneumonia and asthma in Mexico City during winter seasons 2014 and 2015

[…] A whole genome approach (shotgun) was used for both viral and bacterial genomes. Libraries were built with 1 ng of amplified DNA from each sample of RNA viruses and DNA viruses/bacteria, and processed with Nextera XT DNA Library Preparation Kit according to the manufacturer’s instructions (Illumina). Libraries were labeled with Nextera XT Index Kit (Illumina) according the and pooled into one. The pooled library was loaded in a flow cell and sequencing was performed in a MiSeq Desktop Sequencer (Illumina) to obtain paired-end reads of 250 bp in length (2x250). An in-house pipeline was developed to perform the bioinformatics analysis for all files ( step 3). Datasets of human, bacteria and virus sequences (downloaded from the NCBI FTP server in March 2015) were downloaded to build Smalt, BWA index, local BLASTN and BLASTX databases. All data from MiSeq instrument were trimmed through a Phred-like q20 < 20 using Fastqc software. Reads were aligned to databases described above using a standalone BLASTn for direct assignment of each read with an E-value of 1e-30. For viruses, Blastn with an E-value of 1e-30 and Blastx with an E-value of 1e-01, on trimmed reads and after human, bacterial and viral mapping, were used respectively. Velvet algorithm was used for contig assembly.To reduce data complexity for bacterial detection, human sequences were removed by mapping with human smalt index before the pathogen identification process (, step 3). Bacterial species were identified using local BLASTn with an E-value of 1e-30 and the first 10 hits were obtained for each sequence. MEGAN [] 5.2.2 software was used to assign reads to the most appropriate taxonomic level, by assigning a read to the lowest common taxonomic ancestor of the organisms corresponding to the set of significant hits. [...] For the Rhinovirus analysis, full-length genome of Human Rhinovirus A, B C (RV-A, RV-B, RV-C), Enterovirus 68 and 71 (EV68, EV71) were retrieved from the Genbank Database. For phylogenetic analysis of Parvovirus B19 (PVB19), Bocavirus (BoV) and Respiratory Syncytial Virus B (RSV-B), full-length genomes of representative sequences of human strains were used. Analyses also were performed for three different Anellovirus, Torque teno virus (TTV), Torque teno midi virus (TTMDV) and Torque teno mini virus (TTMV). Alignments were created and manually edited with MEGA [] 6.0. Unrooted maximum likelihood tree with 1,000 bootstrap replicates was constructed using the Tajima-Nei model with 5-parameter gamma distributed rates. […]

Pipeline specifications

Applications Phylogenetics, Metagenomic sequencing analysis
Organisms Homo sapiens, Streptococcus phage EJ-1, TTV-like mini virus, Bacteria, Moraxella catarrhalis, Cutibacterium acnes, Streptococcus pneumoniae
Diseases Acinetobacter Infections, Asthma, Meningitis, Haemophilus, Pneumonia, Pneumonia, Mycoplasma, Respiratory Insufficiency