Computational protocol: Fecal metagenomic profiles in subgroups of patients with myalgic encephalomyelitis/chronic fatigue syndrome

Similar protocols

Protocol publication

[…] SMS was carried out on DNA extracts obtained from the 100 fecal samples (50 cases and 50 controls). For Illumina library preparation, genomic DNA was sheared to a 200-bp average fragment length using a Covaris E210 focused ultrasonicator. Sheared DNA was purified and used for Illumina library construction using the KAPA Hyper Prep kit (KK8504, Kapa Biosystems). Sequencing libraries were quantified using an Agilent Bioanalyzer 2100. Sequencing was carried out on the Illumina HiSeq 4000 platform (Illumina, San Diego, CA, USA). SMS libraries from cases and controls were grouped into 10 different pools (10 individuals/pool). Each pool yielded an average of 350 million 100-bp, paired-end reads (mean = 7 Gb of sequence data per sample; median = 6 Gb). Raw SMS data were pre-processed using prinseq (v0.20.3) for end trimming and filtered to exclude low-quality and low-complexity reads. Adaptor sequences were removed using cutadapt (v 1.8.3). Human sequences were subtracted from the dataset using bowtie2 (v2.1.0) and using genomic, mitochondrial, and ribosomal sequences downloaded from NCBI. Bacterial composition (relative abundance) was obtained from raw sequencing data using Metaphlan (v1.7.8) software and processed by Qiime (v1.8). To evaluate overall microbiome differences, we used principal coordinate analysis based on the Bray-Curtis dissimilarity metric. Metabolic pathway analysis was carried out on host-subtracted sequences using Humann2 (v0.7.1) software. [...] Between-group differences (ME/CFS, ME/CFS + IBS, ME/CFS without IBS, and controls) in microbial composition, fecal metabolic pathway expression, plasma immune molecules, and symptom severity were tested using the nonparametric Mann-Whitney U test. Benjamini-Hochberg FDR (false discovery rate) method was used to control the type I error rate at the 0.2 level []. Correlations between bacterial species and disease score were examined using nonparametric Spearman correlation.Bacterial metagenomic and immune profiling assay data were used to develop a logistic regression model for prediction of the following binary response variables: the diagnostic groups ME/CFS, ME/CFS + IBS, ME/CFS without IBS, and controls. To eliminate potential multicollinearity, we used least absolute shrinkage and selection operation (LASSO) [] and random forest (RF) [] feature selection techniques to reduce high-dimensional data into a representative set of variables. Partial least squares (PLS) regression was used to determine the contributions of individual variables to the latent variable that explained the largest portion of the covariance. In-sample receiver operating characteristic (ROC) curves were plotted and area under the curve (AUC) was measured to compare models. To assess the predictive accuracy of the logistic regression models, random resampling cross-validation was performed with 1000 iterations. Data were randomly split into a training set (80%) and a test set (20%) within each iteration. AUC values, prediction error rates, false positive and negative rates were then averaged across iterations for all test sets. Sex, age, race, ethnicity, BMI, site, and season of sample collection were included in all statistical models as potential confounders.Differences in the relative abundance of bacteria at all taxonomic levels were determined with linear discriminant analysis effect size [], which couples tests of statistical significance with measures of effect size to rank the relevance of differentially abundant taxa []. Thus, the Kruskal-Wallis test identifies taxa that are significantly different in relative abundance among different classes, and the linear discriminant analysis (LDA) identifies the effect size with which these taxa differentiate the classes. For each LEfSe analysis, an alpha value of 0.05 for the Kruskal-Wallis test and a log-transformed LDA score of 2.0 were used as thresholds for significance. LEfSe analyses were used to evaluate differences among the fecal microbiome of the ME/CFS, ME/CFS + IBS, ME/CFS without IBS, and controls.Data were analyzed and visualized with SPSS (IBM, NY), Matlab (R2013a, The Mathworks Inc., MA), Prism 7 (GraphPad Software, CA), BioVenn [], and Circos [] software. Genomic data analyzer (Multiple Experiment Viewer, MeV 4.8, MA) was used to define the clustering of metagenomic and immune profile data (with Spearman correlation and Euclidean distance metrics). […]

Pipeline specifications

Software tools PRINSEQ, cutadapt, Bowtie2, MetaPhlAn, QIIME, HUMAnN, LEfSe, BioVenn, Circos
Applications Metagenomic sequencing analysis, Genome data visualization
Organisms Homo sapiens, Bacteroides vulgatus, Bacteria
Diseases Lymphatic Diseases, Fatigue Syndrome, Chronic, Irritable Bowel Syndrome
Chemicals Atrazine, Vitamin B 6