Computational protocol: Sequence Analysis of the Human Virome in Febrile and Afebrile Children

Similar protocols

Protocol publication

[…] A pipeline was developed for the analysis of large numbers of short sequence reads. This was adapted from that used for the analysis of 454 sequences, which used BLASTn and tBLASTx to align sequences to references in the NT database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/), followed by a manual review of the viral alignments. The details of the protocol for analysis of short reads follow. After removal of primer sequences, completely identical sequences were collapsed into a single representative sequence to minimize the number of sequences to be analyzed. Low complexity sequences were then masked using Dust . Sequences with greater than 20 N nucleotides (either from sequencing error or as a result of Dust) were removed. Human sequences were identified for removal by aligning sequences to the Genome Reference Consortium’s human build 36 (http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/data.shtml) including unplaced, human mitochondrial, and 5.8 s, 18 s, and 28 s rDNA sequences using cross_match with the following alignment parameters: minscore 70, bandwidth 3, penalty -1, gap_init -1, gap_ext -1, masklevel 0. Non-human sequences were aligned to a metagenomic database consisting of all virus and phage sequences in NCBI NT plus full genomes from other microbes including bacteria, archaea, and small eukaryotes (Mitreva, et al., unpublished). Cross_match was used with the same parameters used for the human alignments. Any sequences that were unaligned using nucleotide alignment were then aligned to NR (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) using WU-BLAST (BlastX) with the following parameters: filter seg, W 6, WINK 6, nogap. Sequences that aligned to microbial references using either cross_match or WU-BLAST were confirmed by WU-BLAST alignment to the larger NT database. Virus alignments were then manually evaluated, and ambiguous alignments were removed. The same protocol was used for the analysis of the 75-mer data, except a minscore of 50 was used in the cross_match alignments. Detailed sequence statistics are presented in . shows the number of virus sequences found with cross_match and BlastX, without scaling.Viral sequences were assembled into contigs using Tigra (Chen L and Weinstock G, unpublished). […]

Pipeline specifications

Software tools BLASTN, TBLASTX, WU-BLAST, BLASTX, TIGRA
Databases GRC
Applications Metagenomic sequencing analysis, Nucleotide sequence alignment
Organisms Homo sapiens, Astrovirus MLB2
Diseases Infection, Virus Diseases, HIV Infections