Computational protocol: Characterization of Three New Insect Specific Flaviviruses: Their Relationship to the Mosquito Borne Flavivirus Pathogens

[…] Viral RNA (∼0.9 µg) was fragmented by incubation at 94°C for 8 minutes in 19.5 μL of fragmentation buffer (Illumina 15016648). A sequencing library was prepared from the sample RNA using an Illumina TruSeq RNA v2 kit following the manufacturer’s protocol. The sample was sequenced on a HiSeq 1500 using the 2 × 50 paired-end protocol. Reads in fastq format were quality-filtered, and any adapter sequences were removed, using Trimmomatic software. The de novo assembly program ABySS was used to assemble the reads into contigs, using several different sets of reads, and k values from 20 to 40. In all samples, host reads were filtered out before de novo assembly. The longest contigs were selected and reads were mapped back to the contigs using bowtie2 and visualized with the Integrated Genomics Viewer to verify that the assembled contigs were correct. A total of 28.8, 10.0, 5.8, 8.5, 11.4, 9.5, 11.1, 16.5, 11.0, 19.9, and 21.0 million reads were generated for the samples containing MMV, KKV, LTNV, EVG 1_33, EVG 1_42, EVG 2_28, EVG 2_30, EVG 2_81, EVG 2_86, EVG 5_61, and EVG 5_72, respectively. Reads mapping to the virus in each sample comprised ∼1,960,000 (6.83%), ∼340,000 (3.37%), ∼350,000 (6.0%), ∼3,100,000 (36.6%), ∼3,300,000 (29.0%), ∼7,870,000 (82.6%), ∼460,000 (4.1%), ∼3,770,000 (22.8%), ∼2,780,000 (25.3%), ∼2,150,800 (10.8%), and ∼6,400,000 (30.5%), respectively. [...] The evolutionary history was inferred by using the maximum likelihood method based on the General Time Reversible model. The tree with the highest log likelihood (−366425.2857) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood approach, and then selecting the topology with superior log likelihood value. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories [+G, parameter = 0.8309]). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 10.5704% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analyses involved 93 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 5,981 positions in the final dataset. Evolutionary analyses were conducted in MEGA7. […]

Pipeline specifications

Software tools Trimmomatic, ABySS, Bowtie2, MEGA
Applications Phylogenetics, WES analysis
Organisms Viruses, Culex flavivirus