Computational protocol: Metatranscriptomics reveals temperature-driven functional changes in microbiome impacting cheese maturation rate

Similar protocols

Protocol publication

[…] 16S rRNA amplicon reads were analyzed by using QIIME 1.9.0 software, as previously reported. OTUs defined by a 99% of similarity were picked using the uclust pipeline and the representative sequences were submitted to the RDPII classifier to obtain the taxonomy assignment and the relative abundance of each OTU using the Greengenes 16S rRNA gene database.The whole metatranscriptome data analysis was carried out as follows: raw reads quality (Phred scores) was evaluated by using the FastQC toolkit (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adaptor and primer contamination was eliminated with CutAdapt. Then, low quality bases (Phred score <20) were trimmed and reads shorter than 60 bp were discarded with the SolexaQA++ software. Reads were aligned to a reference database by using Bowtie2 in end-to-end, sensitive mode. The database used was built downloading the protein coding portions of the genomes (.ffn files) from the NCBI RefSeq database (ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_BACTERIA/) and from http://patricbrc.org/portal/. The species included were chosen according to the 16S sequencing results and picking species commonly found in food ecosystems (listed in ). Since we did not carry out metagenomics and we did not have the genomes of strains directly isolated from those samples, all the available sequenced genomes were included. The concatenated .ffn files were aligned against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database version April 2011 by using mblastx in order to obtain the functional annotation and the gene taxonomy. The number of reads uniquely mapped to each gene in the database was extracted by using SAMtools and normalized according to the library size using custom scripts built under R environment (www.r-project.org). Only genes to which at least 5 reads/sample mapped were kept for subsequent analyses. Statistical analysis and plotting were carried out in R environment. Differential gene expression analysis was done by using the Bioconductor package DESeq. P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure and a false discovery rate (FDR) <0.05 considered as significant. Pairwise Spearman’s correlations between OTUs, KEGG genes, volatile organic compounds and biochemical indices were computed by using the R package psych and the significant ones (FDR < 0.05) were plotted in a correlative network by using Cytoscape v. 2.8.1. Principal Component Analysis (PCA) and Hierarchical Clustering were carried out by using the made4 package in R. All the results are reported as mean values of two replicates. […]

Pipeline specifications

Software tools QIIME, UCLUST, FastQC, cutadapt, Bowtie2, SAMtools, DESeq
Databases KEGG Greengenes
Applications Metagenomic sequencing analysis, 16S rRNA-seq analysis
Chemicals Amino Acids, Lactic Acid