Computational protocol: Metatranscriptomics reveals the molecular mechanism of large granule formation in granular anammox reactor

[…] Paired-end Illumina reads from each of the five samples were interleaved and checked for quality with FastQC ( Bases that were assigned a Phred score of greater than 20 were retained. Redundant sequences were removed using a normalization script packaged in Khmer with a k-size and coverage of 20, resulting in reducing the data down to approximately 10% of the initial reads. Paired reads were assembled into contigs using Trinity (v2.1.0). Taxonomic annotations were assigned to contigs using the BLASTn algorithm (v2.2.28+) against the nucleotide (nt) database (available from with an E-value cutoff of less than 1e−6 and percentage identity greater than 90%. Contigs were annotated against the Clusters of Orthologous Groups (COG) CDD database (v1.0) using the rpstblastn algorithm and the KEGG genes database using BLASTp, both at an E-value less than 1e−6 . A manually curated database of important genes in nitrogen metabolism was queried against all assembled contigs using tBLASTx. TPM (transcripts per million) was calculated using RSEM (RNA-Seq by Expectation Maximization).To better compare samples, raw reads were submitted to MG-RAST for annotation. Using the Hierarchical Classification feature of MG-RAST, KEGG Orthology (KO) annotations were generated using a minimum E-value of 1e−6, a minimum sequence identity of 60%, and a minimum alignment length of 15 bp. Abundances of resultant functional annotations were normalized to rpoB abundance. As KEGG does not contain the gene for hydrazine synthase (hzsA), all nucleotide sequences matching hzsA were downloaded from NCBI and raw reads were searched against these hzsA sequences using BLASTn with an E-value cutoff of 1e−6. The number of hits was also normalized to rpoB abundance in each sample. Using rpoB-normalized values, relative abundance bars and heatmaps were generated in R using the gplots package. […]

