Computational protocol: Reanalysis and Simulation Suggest a Phylogenetic Microarray Does Not Accurately Profile Microbial Communities

Similar protocols

Protocol publication

[…] The ISPMA uses a set of hash tables, each one containing the probes from a single OTU. A simulated environmental sample was constructed by creating a file containing a number of reference 16S rDNA sequences (in FASTA format). This sample was ‘hybridised to’ the probe sets by turning each reference sequence into a complete set of 25-mers and looking up each of these 25-mers against each of the OTU probe set hash tables in turn. The counts of unique matches to each OTU set were accumulated over all the reference sequences and reported at the end. Probe sets where more than 90%, 92% or 95%) of the probes had matches from any of the reference sequences were then regarded as ‘present’. Examples of how unrelated organisms can share probes and contribute to the counts used to determine OTU presence also came from ISPMA process. The code that implements this process will accept a single OTU id as a ‘target’ and all matches to this OTU's probe set are written to an output file for further analysis. In order to compare phylogenetic identity of taxa before and after ISPMA analyses, input ‘samples’ and results from the ISPMA were compared using RDP classifier to ensure consistency of taxonomy. Word clouds of families, used in , were constructed using Wordle (Jonathan Feinberg, http://www.wordle.net/). Size of text in word clouds is indicative of the number of OTUs within given families. [...] The abundance data (intensity) for the six largest classes detected in Texas air samples as per Table 1 of Brodie et al. , were used to investigate whether probe set results were independent. A random subset, (SA_wk34_ttc, AU_wk19_ttc, AU_wk20_ttc, AU_wk21_ttc, AU_wk22_ttc, AU_wk23_ttc, AU_wk24_ttc, AU_wk25_ttc, AU_wk27_ttc, AU_wk28_ttc, AU_wk29_ttc, AU_wk32_ttc, SA_wk19_ttc, SA_wk20_ttc, SA_wk21_ttc, SA_wk22_ttc, SA_wk23_ttc, SA_wk33_ttc) of Brodie's samples was used . Pearson's correlation coefficients between the abundances of OTUs within each class were calculated in Stata/SE 11.0. Histograms with bin size 0.02 were plotted in SigmaPlot and the counts in each bin scaled to give the same area under the curve. The distribution of Pearson's correlation coefficient expected if the abundances of OTUs were independent of each other was calculated in R using the SuppDists package to find p-values for n equal to 18 then scaling these p-values to give the same area under the curve as the data plots. All scaled counts were plotted in Matlab version 7.7.0(R2008b). […]

Pipeline specifications

Software tools RDP Classifier, SigmaPlot
Applications Miscellaneous, Phylogenetics, 16S rRNA-seq analysis
Diseases Pulmonary Fibrosis