Computational protocol: Virome analysis for identification of novel mammalian viruses in bats from Southeast China

[…] The pre-product was sequenced according to Illumina standard technological process. The raw data was processed based on the internal program, including deleting the adapter and the host sequence, removing duplicated reads and a certain number of reads with low quality value (having > 2 N bases), and getting clean data. The reference database was built with internal procedures to extract the bacterial, fungal, archaeal organisms, and viral sequences from the nucleotide database, which was blasted with the sequences filtered by Short Oligonucleotide Analysis Package (SOAPaligner, version 2). Based on the following results, the sequence of reads with high correlation degree was given species classification. Different reads and sequences were merged for their high similarity and homology. In all the blast results, optimal results were used as the gene annotation with the parameter of E value < 10e-5. Functional analysis of all the genes was performed by BLAST alignment against KEGG (Kyoto Encyclopedia of Genes and Genomes) and eggNOG (Evolutionary genealogy of genes: Non-supervised Orthologous Groups) database. [...] The positive sequence obtained by PCR amplification was aligned by BLAST in GenBank. The homologous sequences were downloaded from GenBank. The results were compared by MEGA6 after comparing the sequences by ClustalW. Phylogenetic reconstructions were performed using MEGA6 and the maximum likelihood method or the neighbor-joining method was performed with 1,000 bootstrap replicates. […]

Pipeline specifications

Software tools SOAP, SOAPaligner, MEGA, Clustal W
Databases KEGG
Application Phylogenetics
Organisms Homo sapiens, unidentified adenovirus