Computational protocol: Baseline survey of the anatomical microbial ecology of an important food plant: Solanum lycopersicum (tomato)

Similar protocols

Protocol publication

[…] The Data Intensive Academic Grid (DIAG) computational cloud (http://diagcomputing.org) was used in combination with the CloVR-16S automated pipeline (Version1.1) [] to perform computationally-intensive tasks, such as chimera detection and nonparametric statistical analyses, on the 16S rRNA gene sequences. The CloVR-16S pipeline utilizes tools for phylogenetic analysis of 16S rRNA data from Qiime [] and Mothur [] for sequence processing and diversity analysis, the RDP Bayesian classifier [] for taxonomic assignment, UCHIME [] for chimera detection and removal, Metastats [] for statistical comparisons of sample groups, and various R programs for visualization and unsupervised clustering. A full description of the CloVR-16S standard operating procedure (SOP) is available online at http://clovr.org. [...] We used the approximately-maximum-likelihood method for phylogenetic inference implemented in FastTree [] to further explore the taxonomic identity of Enterobacteriaceae sequences from the different regions of tomato plants. Reference sequences from Enterobacteriaceae and other phyla observed in the samples were used with Salmonella reference sequences from NCBI (Additional file : Table S2). Inference was performed using the default settings. Clustering of individuals using the program STRUCTURE [,] was performed with K = 2, and K = 3. [...] Sequences were clustered stringently using the Qiime UCLUST module set for a 99% identity threshold. Representatives of each cluster (i.e., the longest read in each cluster) were examined for chimeras using UCHIME [] in de novo mode. Clusters identified as chimeras were removed from further analysis. Remaining representatives were searched against the SILVA rRNA small subunit (SSU) [] database (limited to reference sequences with full taxonomic identification) with BLASTN and a minimum e-value threshold of 1e-5. To provide information about overall fungal distribution, the closest known neighbor for each 99% identity cluster was assigned to the taxonomy of the best-BLAST-hit to the representative sequence. [...] Whole genome shotgun (WGS) metagenomic sequences were provided as input to the CloVR-Metagenomics pipeline (version 1.0) using the “no - Open Read Frameorfs” (no-ORFs) option and the MgRast metagenomics analysis server (version 3.2 Argonne National Laboratory. Argonne, IL http://metagenomics.anl.gov) []. Different maximum e-value cutoffs, minimum percentage identity cutoffs and minimum alignment length cutoffs were used for different questions (see individual list in Results section). For overall phylogenetic designation at phylum level – default parameters were 80% similarity over 100 bases at 1e-5. CloVR-Metagenomics was used with a BLAST-based protocol to perform taxonomic and functional annotations as well as statistical analysis with Metastats and R. CloVR pipeline for metagenomes was used with the following SOPs:1) UCLUST first clusters redundant sequences that show 99% nucleotide identity and removes artificial 454 replicate reads. 2) Representative DNA sequences are searched against the NCBI COG database using BLASTX. 3) Representative DNA sequences are searched against the NCBI RefSeq database of finished prokaryotic genomes using BLASTN. 4) Metastats and CloVR-implemented R scripts are applied for additional statistical and graphical evaluations of the pipeline results. Functional annotation was examined using the COGs database []. A full description of the CloVR-Metagenomics SOP is available online at http://clovr.org. […]

Pipeline specifications

Software tools CloVR, QIIME, mothur, UCHIME, Metastats, FastTree, UCLUST, BLASTN, BLASTX
Databases COGs
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Solanum lycopersicum, Homo sapiens
Diseases Salmonella Infections