Computational protocol: A Bayesian Semiparametric Regression Model for Joint Analysis of Microbiome Data

Similar protocols

Protocol publication

[…] We conducted simulation studies to assess the performance of our model. We compared the model to an alternative model, the negative binomial mixed model (NBMM) in Zhang et al. (). We assumed a sample of J = 200 OTUs. We used the same time points (ti) and numbers of replicates (Ki) of our ocean microbiome data as shown in Figure . We let βj,pTR=0 with probability 0.85. For βj,pTR≠0 we simulated βj,pTR from either of N(−1.5, 0.052) or N(1.5, 0.052) with equal probability, where N(a, b2) denotes the normal distribution with mean a and variance b2. It implies that a covariate has no effect on OTU abundance with probability 0.85 or may significantly affect mean abundance with the remaining probability 0.15. To specify rt,kTR and α0,jTR, we did not assume any distribution and used their classical estimates from our ocean microbiome data; following Witten (), we first computed estimates of sample size factors rti,k′ and OTU size factors α0,j′ using the ocean microbiome data, rti,k′=yti,k,·/y··· and α0,j′=1N∑i=1n∑k=1Kiyti,k,j/rti,k′ where yti,k,·=∑j=1Jyti,k,j and y···=∑j=1Jy··,j. We then randomly sampled from the pool of rt,k′ and α0,j′ to specify the true values. To simulate temporal dependence in OTU abundance, we let α~ti,jTR=ati,jcos(2π(t~i-bti,j))+cti,j(t~i-t~⋆)2. Here t~i denotes time ti in year and t~⋆ the median of t~i. We let at,j~iidN(0.15,0.12), bt,j~iidN(0,0.52), and ct,j~iidN(0.1,0.12) to have different patterns for OTUs. For some OTUs, α~ti,jTR are illustrated in red squares in Figures 4E–G. We generated sjTR~iidGa(1,10). We used the covariate matrix of the ocean microbiome data illustrated in Figure for the simulation study. For the missing covariates in the data, we generated a value of possible categories with equal probability. We finally simulated OTU counts yti,k, j from the negative binomial distribution yti,k,j~indepNB(μti,k,jTR,sjTR), where μti,k,jTR=rti,kTRα0,jTRexp(α~ti,jTR+XtTRβjTR).For comparison, we used the negative binomial mixed model (NBMM) in Zhang et al. (). Similar to the proposed model, the NBMM uses a negative binomial distribution with mean μNBMM and shape parameter θNBMM to model OTU counts and assumes log(μt,k,jNBMM)=log(yt,k,·)+β0,jNBMM+tβjNBMM+Zt,kbjNBMM where Xt and Zt,k are the covariate matrices for fixed effects and random effects, respectively. It assumes random effects bjNBMM~N(0,Ψ). By letting the replicates at a time point share the same random effect, OTU abundances in the replicates at a time point are correlated. The NBMM normalizes OTU counts by sample total counts. That is, sample total counts yt,k, · are used as an offset to adjust for the variability in total counts across samples. Similar to other existing methods, the NBMM performs separate analyses of OTUs. An iterative weighted least squares algorithm is developed to produce the MLEs under the NBMM and implemented in a R function glmm in R package BhGLM. [...] We applied the proposed statistical method to ocean microbiome data. Seawater samples were collected weekly at the end of Santa Cruz Municipal Wharf (SCW), Monterey Bay (36.958 oN, −122.017 oW), with an approximate depth of 10 m. SCW is one of the ocean observing sites in Central and Northern California (CenCOOS), where harmful algal bloom species [HAB species: Alexandrium (Ax), Dinophysis (Dp), Pseudo-nitzschia (Pn) etc.] are monitored weekly along with nutrient measurements [ammonia (NH4), silicate (Si), nitrate (N), phosphate (P)], temperature (T), domoic acid (DA) concentration, and chlorophyll (Chl). Details of phytoplankton net tow sampling of measuring phytoplankton abundance, measurement of physical (nutrients and temperature) and biological parameters (chlorophyll α and DA concentration) are described in Sison-Mangus et al. (). Pseudo-nitzschia, Dinophysis, and Alexandrium cells were counted with a Sedgewick rafter counter under the microscope. Data for physical and biological factors are available from the website link http://www.sccoos.org/query/?project=Harmful%20Algal%20Blooms&study[]=Santa%20Cruz%20Wharf. Among the 10 variables, the concentration levels of Alexandrium, Dinophysis, Pseudo-nitzschia, and domoic acid have highly right-skewed distributions and are discretized into categories based on their biological properties for our analysis. The ranges of the concentration levels for the discretization are in Supplementary Table and Figures illustrates all covariates included for analysis. The values of −1, 0, 1, 2, 3, and 4 represent missing values and the categories of None, Low, Medium, High, and Very High for the discretized variables, respectively. Due to high right skeweness, categories corresponding to high concentration levels have low frequencies. Values of the Dinophysis concentration level are missing at 20 time points among 55 points used for analysis. Sample correlations between the factors are relatively strong. Figures shows scatterplots for some selected pairs of the factors.For bacterial RNA samples, three depth-integrated (0, 5, and 10 ft) water samples were collected at a total of 55 time points between April 2014 and November 2015. Two or three samples are sequenced at each time point. The numbers of replicates at the time points are illustrated in Figure . Microbial RNA in the samples was extracted for 16S rRNA sequencing. Post-processing of sequences was performed using the Quantitative Insights Into Microbial Ecology (QIIME 1.9.1) pipeline. A total of nearly 39,823 OTUs were obtained in data after removing singletons. We restricted our attention to OTUs that have greater than or equal to five counts on average. The rule leaves in the end J = 263 OTUs for the 150 samples for the analysis. A heatmap of the counts in the filtered data is shown in Figure . The primary goal of the study is to investigate the effects of physical and biological factors on abundance levels of OTUs, while accounting for baseline abundance levels of OTUs in samples. […]

Pipeline specifications

Software tools BhGLM, QIIME
Applications 16S rRNA-seq analysis, GWAS
Diseases Bloom Syndrome