Computational protocol: Locality and diel cycling of viral production revealed by a 24 h time course cross-omics analysis in a coastal region of Japan

Similar protocols

Protocol publication

[…] Co-assembly of sequence reads from the nine OBV samples generated, after decontamination of prokaryotic sequences, 878 viral contigs longer than 10 kb (OBV long contigs; including 45 circularly assembled genomes and 833 non-circular genomes) []. In addition, this assembly generated 27,551 short contigs (i.e., longer than 1 kb but no longer than 10 kb). The abundance of these contigs was assessed based on the relative abundance of terminase large subunit genes (terL). Putative terL genes were detected as follows. Gene predictions were computed using MetaGeneMark []. HHsearch (probability > 85% and p-value < 1e˗5) [] was performed for each predicted gene against the Pfam database (version 31.0) after constructing query hidden Markov models using jackhammer (included in the HMMER package) [] with default settings against genes included in 5095 viral genomic sequences (described below; Supplementary Table ). As a result, 1348 genes (from 1330 contigs) were detected as putative terL genes (i.e., genes with the best hit to PF03354.14, PF04466.12, PF03237.14, and PF05876.11). The FPKM (fragments per kilobase per mapped million reads) for putative terL genes were calculated by in-house ruby scripts. A longer terL was selected if two putative terL genes were detected in one contig.The viral genomic sequence set (likely originating from prokaryotic dsDNA viruses; 5095 sequences; Supplementary Table ) was collected in a previous study []. These sequences belong to one of the following three categories: (i) 878 viral contigs ( > 10 kb, including 45 circularly assembled genomes and 833 non-circular genomes) obtained from OBVs, (ii) 1766 environmental viral genomes (EVGs; all are circularly assembled genomes) derived from marine virome studies (not including OBVs), and (iii) 2429 reference viral genomes (RVGs) of cultured dsDNA viruses. Genus-level genomic OTUs (gOTUs) were assigned for complete genomes (i.e., 45 circularly assembled genomes in (i) and all genomes in (ii), and (iii)) as previously described []. In addition, for the OBV non-circular (i.e., possibly incomplete) genome sequences, if the sequence showed similarity to one of the complete genomes (with genomic similarity SG > 0.15), the sequence was assigned to the gOTU of the most similar complete genome. Putative host groups (Cyanobacteria, Bacteroidetes, SAR11, SAR116, and SAR86) were predicted from genomic similarity to known viruses and/or gene contents of complete genomes []. As viruses belonging to the same gOTU were predicted to have consistent host groups (i.e., each gOTU was not assigned to more than one host group), putative host groups were also applied to the other members of each gOTU.Quality controlled virome and metatranscriptome sequence reads (Supplementary Table ) were obtained through quality control steps as previously described []. These reads were mapped against the viral genomic sequence set using Bowtie 2 software with a parameter “--score-min L,0,-0.3” []. FPKM values were calculated by in-house ruby scripts. Hierarchical clustering of OBV contigs based on Pearson correlation of time-series virome abundance was performed using the average linkage method implemented on Multiple Experiment Viewer version 4.9.0. The gap statistics calculated by the R package clusGap [] were used to assess the number of clusters obtained from the hierarchical clustering with parameters ‘Kmax = 10, bootstrap = 10000’. In this analysis, the optimum number of clusters was estimated to be seven. All statistical tests were performed with R software (version 3.3.1). […]

Pipeline specifications

Software tools HHSearch, HMMER, Bowtie
Databases Pfam
Application Amino acid sequence alignment
Organisms Viruses, Bacteria