Computational protocol: Viral communities associated with healthy and bleaching corals

Similar protocols

Protocol publication

[…] Plasmids containing viral DNA inserts were sequenced on capillary sequencers (Agencourt Biosciences, Beverly, MA and SymBio Corp, Menlo Park, CA). PHRED was used to call bases and to trim vector and adapter sequences (; ). Base quality was scored using PHRAP (). Bases with PHRAP scores < 20 were masked with an N. Sequences were trimmed further to remove ends containing fewer than 50 unambiguous bases by using FastGroupII (). Trimmed sequences shorter than 100 bp were eliminated from analyses.Sequences were assembled into contigs using TIGR Assembler (). Chromatograms from all contigs were visually inspected for the occurrence of identical sequences, which result from sequencing a bacterial clone that has grown in two neighbouring culture wells. Duplicate sequences of this kind were removed from analyses. The DsH library yielded a total of 1580 high-quality sequences; the DsB library yielded 930. From sequence assemblies, contig spectra were determined and were used to model viral community structure with the PHACCS (PHAge Communities from Contig Spectrum) online tool (). [...] Sequences were compared with the GenBank NR and NCBI ENV databases using tblastx (; ). Significant hits were defined as matches with E-values less than 0.001. For summary statistics, sequences were sorted based on hits to each of the two databases. For analysis of viral types, sequences were categorized based on the top viral hit and further sorted based on viral host (microbe or eukaryote).Sequences were then compared with two smaller databases using blastx (E-value < 0.001). First, all sequences were compared with a database of complete genomes from eukaryote-specific viruses. These genome sequences are curated by RefSeq and were downloaded from the NCBI Viral Genomes Resource on 20 August 2007 (; complete list of accession numbers provided in Appendix S1). The resulting database of 1974 viral genomes contained a total of 28 456 annotated proteins. Second, sequences were compared with the Phage Sequence Databank (; version released 5 December 2006), which contains 510 complete phage genomes as well as manually curated phage and prophage sequences. A list of blastx hits and a list of accession numbers used in the eukaryote-specific virus database are available from the authors. To visualize the distribution of sequence hits across viral genomes, sequences were compared with the same two databases using tblastx (E-value cut-off = 0.001) and the best hit was used to plot each sequence onto a viral genome. [...] Sequence libraries were compared with a database of all complete phage genomes with blastn (significance cut-off of E-value< 0.01). Significant hits were transposed onto a previously constructed multiprotein phylogenetic tree of the phage genomes (). The tree topology and the distribution of hits along this tree were uploaded to the UniFrac online computational platform (; ). The Cluster Environments analysis was used to group environments based on the similarity of the viral lineages contained within each (100 jackknife permutations, Use abundance weights = True). The UniFrac significance test was used to compare pairs of viral communities (100 permutations, Use abundance weights = True). […]

Pipeline specifications