Computational protocol: Identification of Hepatotropic Viruses from Plasma Using Deep Sequencing: A Next Generation Diagnostic Tool

Similar protocols

Protocol publication

[…] Low quality bases (quality score “B”) were trimmed from the 3′ end of each read end, and the remaining sequence was kept only when its high quality 5′ moiety was longer than 29 nt. Primer/adapter sequences and low complexity regions were also trimmed out. SOAPaligner was used to remove mitochondrial and ribosomal RNAs, using the default parameters. RepeatMasker was then used with a database containing simple repeats, ribosomal and mitochondrial sequences, to further filter the reads with higher sensitivity. The remaining sequences were considered clean reads. To minimize CPU time, clean reads were aligned to the human, bacteria, and virus databases from NCBI, using SOAPaligner with the default parameters. Remaining read ends were aligned with standalone BLASTn against the above-described databases. An E-value of 1e–05 was used as the cutoff. In the specific case of the human database, 80% coverage and 80% identity were also required. When a query hit more than one taxon, hits were sorted by E-value and additional taxa were kept if their hits were within two nucleotides of the top hit. For example, if a query hit the human database with 55 identical nucleotides and the virus database with 53 identical nucleotides, the query sequence was then assigned to the ambiguous category human-virus (HV). Additionally, the taxonomy assigned to each end of a pair was compared and the ambiguity solved when both ends intersected the same taxon. For instance, for a given pair, if one end is classified as HV and its counterpart as H, this pair was reclassified as H-H. Read ends that could not be classified were binned as unknown.Non-human and non-bacteria (unknown, viral, phage, human endogenous retroviruses [HERV] and ambiguous) read ends were subjected to de novo assembly with the SOAPdenovo-Trans software (, using the default parameters. We chose the transcriptome assembler SOAPdenovo-Trans, instead of the older genome assembler SOAPdenovo, because it takes into account the problem of uneven coverage, which is present in both transcriptomic and metagenomic libraries. Assembled scaffolds were aligned with BLASTx against the ‘nr’ taxonomy databases, including archaea, bacteria, HERV, fungus, plant, human, invertebrate, mammal, phage, protist, vertebrate and virus entries. As before, an E-value cutoff of 1e-05 was used and the top hit was reported. In the specific case of phycodnavirus and mimivirus top hits, some scaffolds were re-analyzed with the HHblits algorithm, using the most recent version of the UniProt20 database . HHblits utilizes profile hidden Markov models to represent both query and database sequences; these profiles are then aligned using HHsearch . […]

Pipeline specifications

Software tools SOAPaligner, RepeatMasker, BLASTN, SOAPdenovo-Trans, BLASTX, HHblits, HHSearch
Applications Metagenomic sequencing analysis, Amino acid sequence alignment
Organisms Homo sapiens, GB virus C
Diseases Fatty Liver, Alcoholic, Hepatitis B, Hepatitis C, Liver Diseases, Hepatitis, Autoimmune, Hepatitis B, Chronic, Chemical and Drug Induced Liver Injury