Computational protocol: Direct sequencing of human gut virome fractions obtained by flow cytometry

Similar protocols

Protocol publication

[…] The obtained sequences were filtered, quality trimmed, and adaptors were removed using Roche’s SFFINFO tool and then they were double-checked for the presence of Y adaptors in the 3′ end DNA using Biostrings v.2.11 package () in R programming language (). Low complexity reads (entropy <50), low quality reads (<25), short reads (<50 bp), and erroneous reads (>5% N bases) were removed using PRINSEQ (). Sequences were assembled with MIRA3 () using de novo genome accurate 454 settings, permitting the assembly of as few as two reads per contig.The open reading frames (ORFs) in the larger contigs (>1,000 bp) were identified by Glimmer3 () and annotated by an InterProScan search using all available databases, combining the individual strengths of these different annotation sources and providing comprehensive information about protein families, domains, and functional sites (). The maps of ORFs detected in the contigs was constructed using the genoPlotR package () in R programming language. Moreover, the larger contigs (>1,000 bp) were annotated using the “blastx” algorithm using the “nr” database (). To decide whether a sequence could be classified as virus/phage by “blastx”, we used an approach previously used by , revising the 100 best matches. Moreover, the same contigs were analyzed by searching on the ACLAME database of mobile genetic elements ().Contigs shorter than 1,000 bp and unassembled reads were annotated by “blastn” on the “nr” database using the “megablast” algorithm (). The taxonomy assignation of the best GI matches were retrieved by a script written in R programming language using the ape package (). The frequency of all genera, as well as the ten most frequent genera across the two datasets were compared by Pearson correlations. These sequences were also compared with the phiSITE database containing only viral genomes ().The workflow of the data analysis is shown in Figure . […]

Pipeline specifications

Software tools Biostrings, PRINSEQ, Glimmer, InterProScan, genoPlotR, BLASTX, BLASTN, APE
Databases ACLAME phiSITE
Applications Phylogenetics, Genome data visualization
Organisms Homo sapiens