Computational protocol: Intestinal Microbiota Composition of Interleukin-10 Deficient C57BL/6J Mice and Susceptibility to Helicobacter hepaticus-Induced Colitis

Similar protocols

Protocol publication

[…] Sequences were pre-processed using the software package mothur (Linux 32-bit version 1.22.1 and Windows 32-bit version 1.21.1) following the suggestions in the “Schloss SOP” tutorial , . Briefly, primers and barcodes were removed, and sequences were trimmed where the average quality score over a 50-bp-window dropped below 35. Sequences were considered low quality and culled if they contained homopolymer stretches longer than 8 bp, included ambiguous base calls, or were shorter than 150 bases. Remaining non-identical sequences were aligned to the “SILVA” reference alignment with mothur’s align.seqs algorithm, using the default settings of 8-mer searching and Needleman-Wunsch alignment. Sequences which did not align in the expected region of the reference alignment were removed. PCR chimeras were identified using UCHIME with the mothur-provided “SILVA gold” reference set , and culled.The resulting dataset was analyzed with Ribosomal Database Project (RDP) classifier version 2.4 with the hierarchy model of training data no. 7 . Sequences were classified to the lowest taxonomic rank that received at least 70% bootstrap support (BTS). Following Ochman et al. , sequences which could not be assigned to class at BTS 70% were excluded from further analyses, except when they were classified to lower taxonomic ranks for which no class was designated. Potential organellar sequences were further analyzed by blastn similarity search against the SILVA reference database, build 108 . In order to obtain the possibility to reliably identify sequences to species, the RDP 16S rRNA database (release 10, update 28, matching RDP classifier training data no. 7) was modified using TaxCollector to include taxonomic descriptions with easily identifiable species designations. Query sequences identified to genus level by RDP classifier at a stringent cutoff of 97% BTS were searched against this TaxCollector-modified RDP database using MegaBlast . Sequences were considered as annotated to species level if their identity to the best hit in the database was at least 97%, the aligned region covered at least 97% of the query sequence length, they did not match a second species at the same BLAST Expectation value, and the identified species did not conflict with the genus as identified by RDP classifier.In order to facilitate comparison of these data with the altered Schaedler flora, the sequences provided by Dewhirst et al. were similarly analyzed by RDP classifier.OTUs were calculated using ESPRIT-Tree . In order to avoid artifacts due to the inclusion of the additional sequence stretch in the longer MHH2011 sequences, all sequences were first trimmed to the region corresponding to the 8F-338R amplicon as identified in the mothur-generated alignment. Following Cai and Sun , the cutoff OTU level to best represent species level was determined using the normalized mutual information (NMI) criterion on a subset of sequences that could be identified to species. For sequences classified to species, the NMI between species and OTU classification was calculated following Fred and Jain . After comparing NMI values for OTUs obtained for 30 distance levels from 0.01 to 0.3 (), we retained the OTUs at level 0.08, which resulted in peak NMI values. To link all sequences in the dataset with both OTU and taxonomic information, OTUs were classified to the lowest possible rank based on the 80% consensus of their respective RDP classifications using a custom VBA script. Additionally, OTU consensus sequences were computed in Geneious Basic v. 5.6.4 based on alignments generated using MUSCLE . For alignment of OTU 0.08_group_279, which contained too many sequences to be computed in MUSCLE, we used MAFFT with the settings “FFT-NS-2–ep 0.123”. Consensus sequences were classified using the RDP classifier and the TaxCollector-modified RDP database as detailed above. As OTU consensus sequences could often be classified to a lower taxonomic rank than the majority of their component sequences, but were only crudely classified if they contained ambiguous bases, we chose the more detailed of the two classifications for each OTU ().To further analyze and visualize the final dataset, a mothur-compatible “sharedfile” was reconstructed from the OTU information. The Windows version of mothur (version 1.21.1) was used for a range of statistical and visualization steps: To avoid artifacts produced by unequal sample sizes, the sequences from each mouse sample were subsampled to match the 1227 sequences found in the sample with the lowest sequence count. Rarefaction curves and rank-abundance-plots were generated. Differences among samples were visualized by Principal Coordinates Analysis (PCoA) using Jaccard index distances, which measure dissimilarity of microbiome composition. Mothur was also used to generate Venn diagrams (see below).A heatmap of OTU abundances among samples was produced using the R script heatmap . Input data for heatmap construction was based on the mothur-type sharedfile, which was transformed to facilitate visualization of the full range of OTU abundances. This transformation was carried out according to the formula h = LN(count+0.1), where h is the input value for the heatmap script, count is the sequence count for one OTU in an individual sample, LN is natural logarithm and 0.1 is used as a small increment to avoid occurrences of LN(0). The heatmap color key was manually modified to correspond to untransformed OTU abundances. To increase readability, individual OTU labels were replaced with taxonomic information and sample labels were replaced with sample numbers and group information using Inkscape 0.48.Sequences which did not occur in at least 4 samples of either the cecum or the colon samples of at least one mouse group were classified as too rare to judge site specificity and excluded from the lists of site-specific OTUs and from the Venn diagrams. For additional comparisons, significance of abundance differences between sample groups was assessed using Metastats , as implemented in the original R script, at a significance cutoff of p≤0.05. […]

Pipeline specifications

Software tools mothur, UCHIME, RDP Classifier, BLASTN, ESPRIT-Tree, Geneious, MUSCLE, MAFFT, Inkscape, Metastats
Applications Miscellaneous, Metagenomic sequencing analysis, 16S rRNA-seq analysis, Nucleotide sequence alignment
Organisms Mus musculus, Helicobacter hepaticus
Diseases Colitis, Infection, Inflammatory Bowel Diseases