Computational protocol: A complete annotation of the chromosomes of the cellulase producer Trichoderma reesei provides insights in gene clusters, their expression and reveals genes required for fitness

Similar protocols

Protocol publication

[…] We used a completely manually curated annotation of T. reesei QM6a in this work. This was obtained by re-analysis of all not identified or ambiguously annotated genes deposited at the T. reesei genome website ( To this end, we used BLASTP against the NCBI database (last accession July 12, 2015), and used only hits with E values e−30) were considered “orphan proteins”. For orphans with no ESTs in the JGI and NCBI database, we reinvestigated whether their reading frame was correct. While several cases of incorrect annotation were indeed detected, this did not result in a change from “orphan” to “unknown” or already identified genes (unpublished data). As a last step, we mapped the annotated genes on the seven chromosomes, using the GRAAL-supported assembly of the T. reesei scaffolds []. The resulted database is available at: ( [...] We used transcriptome data from our own earlier work. These included: cultivation of T. reesei QM 9414, an early cellulase producing mutant, on d-glucose, glycerol, lactose and wheat straw (mechanically ground, and subjected to slightly acidic, thermochemical pre-treatment; obtained from Clariant Produkte Deutschland GmbH), respectively, in batch cultures [–], during induction of conidiation [], induction of cellulase gene expression by sophorose [] and at the onset of confrontation with the basidiomycete Thanatephorus solani []. All transcriptome data were obtained by oligonucleotide array hybridization, with the exception of the data for cultivation on glycerol and induction by sophorose, which were obtained by RNA deep sequencing. For the former, a high-density oligonucleotide microarray (Roche-NimbleGen, Inc., Madison, WI) with 60-mer probes representing 9129 genes of T. reesei was used. Values were normalized by quantile normalization [] and the RMA algorithm []. After elimination of transcripts that exhibited an SD >20 % of the mean value within replicates, false discovery rates ([] were used to assess the significance of values. Data from RNA deep sequencing were analysed using the EOULSAN software version 1.2.2 []. To quantify the gene expression level, the relative transcript abundance was measured in reads per kb of exon per million mapped sequence reads (RPKM; []). All transcriptome data and the related protocols are available at the GEO web site ( under the accession numbers given in Table . […]

Pipeline specifications

Software tools BLASTP, Eoulsan
Applications Phylogenetics, RNA-seq analysis
Organisms Trichoderma reesei
Chemicals Cysteine, Iron, Nitrogen