Similar protocols

Protocol publication

[…] We trimmed and filtered the paired-end sequencing reads using Sickle v1.200[] by applying a sliding window approach and trimming regions where the average base quality drops below 20. After this, we applied a 10bp length threshold to discard reads that fall below this length. We then used pandaseq (v 2.4) with a minimum overlap of 50bp to assemble the forward and reverse reads into a single sequence spanning the entire V4 region[]. After having obtained the consensus sequences from each sample, we used the UPARSE (v7.0.1001) as described in https://bitbucket.org/umerijaz/amplimock/src for operational taxonomic unit (OTU) construction. The approach was as follows: we pooled the reads from different samples together and added barcodes to keep an account of the samples these reads originated from. We then de-replicated the reads and sorted them by decreasing abundance and discarded singletons. Reads were then clustered based on 97% similarity, discarding reads shorter than 32bp. Even though the cluster_otu command in usearch removes reads that have chimeric models built from more abundant reads, a few chimeras may be missed, especially if they have parents that are absent from the reads or are present in very low abundance. Therefore, in the next step, we used a reference-based chimera filtering step using a gold database (http://drive5.com/uchime/uchime_download.html) that is derived from the ChimeraSlayer reference database in the Broad Microbiome Utilities (http://microbiomeutil.sourceforge.net/). The original barcoded reads were matched against clean OTUs with 97% similarity (a proxy for species level separation) to generate OTU tables for different samples. The representative OTUs were then taxonomically classified against the RDP database using the standalone RDPclassifier v2.6[] with the default—minWords option of 5. To find the phylogenetic distances between OTUs, we first multisequence aligned the OTUs against each other using mafft v7.040[] and then used FastTree v2.1.7 on these alignments to generate an approximately-maximum-likelihood phylogenetic tree[]. [...] Whole-genome shotgun metagenomics reads were trimmed for Nextera adaptors and low-quality ends using Trimmomatic[]. These were screened against the hg18 human reference genome using Bowtie2[], with any matching sequences discarded. Reads were subsampled to 2 million reads and assigned to functional categories through alignment to Kyoto Encyclopedia of Genes and Genomes (KEGG) release 58.0 (April 1, 2011) using RAPSearch2 using a translated nucleotide to amino acid search. Alignments were assigned to KEGG metabolic pathways using HUMAnN[]. HUMAnN uses KEGG orthology as well as orthologus families of genes and calculates coverages as pathway presence/absences. It also uses MinPath[] to filter out pathways that have very little evidence. [...] Statistical analysis was performed in R software and was similar for the 16S rRNA and metagenomic datasets unless otherwise stated. Where appropriate, the abundance data was normalised[] choosing log-relative transformation before doing statistics for downstream analysis. To find OTUs that are significantly different between the conditions, we used the DESeq2 package[]. This uses a negative binomial to model the abundance data (OTU frequencies) and empirical Bayes to shrink OTU-wise dispersions to identify OTUs that have the maximum log-fold changes between different conditions. Differential expressions were tested by performing a Wald test on shrunken log-fold changes adjusted for multiple comparisons. For community analysis (including alpha and beta diversity analyses) we used the Vegan package[], in particular the two functions adonis for PERMANOVA and betadisper for the analysis of multivariate homogeneity of group dispersions. The p-values reported in such a case were those returned by the functions themselves. Microbial compositional structure was assessed using non-metric multidimensional scaling plot (NMDS). We applied the Bray-Curtis dissimilarity index, which considers bacterial taxon presence and abundance, but also the unweighted Unifrac distance analysis which takes into account the phylogenetic distances (relatedness) of the bacterial taxa, without accounting for their abundance. Specifically, the abundance table was converted to a presence/absence table in the case of unweighted Unifrac distance. The taxa present in one or both samples were then placed on the phylogenetic tree. The distance between two samples was then calculated as the sum of unshared (taxa not common) branch lengths divided by the sum of all tree branch lengths, both shared (taxa common) and unshared, between pair of samples. To calculate Unifrac distances, we used the Phyloseq[] package. We also performed local contribution for β-diversity (LCBD) analysis to measure the contribution of each sample to the total OTU β-diversity, calculated from all study samples together (% of total community dispersion) []. Samples with high LCBD represent samples that are markedly different from the average β-diversity of all study samples. For differences in metagenomic metabolic pathways, we used the Kruskal-Wallis test. For correlations between SCFA and discriminatory OTUs, we used Kendall rank correlation. For α-diversity, subanalysis was also performed accounting for the genetic relatedness of participants from the CD and CDR group (paired data). We used the Benjamini-Hochberg correction for multiple testing in all analyses. The authors maintain the general scripts as well as tutorials for the above analyses at http://userweb.eng.gla.ac.uk/umer.ijaz#bioinformatics. […]

Pipeline specifications

Software tools PANDAseq, UPARSE, USEARCH, UCHIME, ChimeraSlayer, MAFFT, FastTree, Trimmomatic, Bowtie2, RAPSearch, HUMAnN, DESeq2, phyloseq
Databases KO KEGG
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens, Bacteria
Diseases Crohn Disease
Chemicals Fatty Acids