Computational protocol: A human gut phage catalog correlates the gut phageome with type 2 diabetes

[…] An operational taxonomic unit (OTU) is defined as a group of closely related individuals that share a given set of observed characters or undetermined evolutionary relationships []. We bioinformatically defined phage OTUs (pOTUs) (Fig. b; see also Additional file : Materials and Methods for details) to profile the phageome and examine the interactions between phages and bacteria in the gut ecosystem. Briefly, we constructed an Expanded Phage-Specific Gene database (EPSGDB) according to the Phage Orthologous Groups (POGs) and the phage classification defined by the International Committee on Taxonomy of Viruses (ICTV) [, ]. A pOTU was defined as the collection of all phages with the same taxonomic names at all five levels (group|order|family|subfamily|genus) and sharing hosts within the same genera and a pOTU is not necessary to be associated with species or genus. Thus, a pOTU was named with an ordered list of phage names at all the five levels plus the genus name of the bacterial host. In brief, we defined a pOTU to represent a group of phages with homologous phage taxon-specific genes and the same bacterial genus of their hosts. The relative abundance of a pOTU in a sample was calculated by summing the numbers of all phage genomes belonging to the pOTU and dividing by the number of 16S rRNA gene reads in the sample.Fig. 2The MetaPhlAn software was used to profile the bacterial genera for each sample based on metagenomic reads with default flags []. [...] The genes on the putative large phage scaffolds were predicted by MetaGeneMark with default flags [], followed by functional annotation performed by comparing the genes to various databases, including the GenBank nr database, COG (Clusters of Orthologous Groups) database [], Tigr (The Institute for Genomic Research) Microbial Database [], and Pfam database of conserved amino acid motifs [], using RPSblast (Reversed Position Specific Blast) with an e-value cutoff of 1e−10 []. Virulence genes were identified by comparing the genes against the VFDB (Virulence Factor Database) with an e-value cutoff of 1e−05 []. [...] Large subunit terminase (LST) sequences were selected as the marker to build phylogenetic trees of gut phages. The protein sequences annotated as LST were extracted from all ENA phage sequences. Pfam domains were searched using the hmmsearch program in the HMMER3 package (e-value cutoff 1e−05), and four domains, including Terminase_1 (PF03354), Terminase_3 (PF04466), Terminase_6 (PF03237) and Terminase_GpA (PF05876), were found on the LST sequences []. Thereafter, all the protein sequences encoded by the large phage scaffolds were searched against the Pfam database, and the protein sequences with one of the four functional domains were included in the tree construction. The alignment was performed and maximum likelihood trees were constructed by the program MEGA with 1000 bootstraps []. The trees were visualized by the Figtree software ( [...] The co-correlation/exclusion between the gut bacteria and phages were calculated based on the relative numbers of bOTUs and pOTUs by SparCC (with p < 0.01 and correlation > 0.3) []. Only the bacterial genera and pOTUs with a high frequency (detected in more than 72 samples (≥ 50%)) were considered. The network layout was calculated and visualized using a circular layout by the Cytoscape software []. Only edges with correlations greater than 0.3 and a p value less than 0.01 were shown, and unconnected nodes were omitted. [...] The richness of the phageome in the 145 samples of stage I was estimated by the R package Vegan based on the Chao2 richness estimator. Two rarefaction curves of the gut phageome in the control samples and T2D samples were generated. The linear discriminant analysis (LDA) scores of the variations of bacterial genera and pOTUs between control samples and T2D samples were calculated and visualized by LEfSe (LDA Effect Size) []. Based on the relative abundances of the highly prevalent pOTUs detected in more than 120 samples (70% of 145 samples), the significance of the variations of the gut phages between T2D and control groups were assessed by the Mann–Whitney rank-sum test with FDR correction []. The significant variation of the relative abundances of the identified phage scaffolds between the T2D and control groups was assessed by the Mann–Whitney rank-sum test. […]

Pipeline specifications

Software tools MetaPhlAn, BLASTN, HMMER, MEGA, FigTree, SparCC, vegan, LEfSe
Databases Pfam VFDB ICTV
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens, Bacteria, Escherichia coli
Diseases Diabetes Mellitus, Diabetes Mellitus, Type 2, Machado-Joseph Disease