Computational protocol: Genome mining for peptidases in heat-tolerant and mesophilic fungi and putative adaptations for thermostability

Similar protocols

Protocol publication

[…] The annotated genomes of thermophilic, sensu Oliveira et al. [], thermotolerant and mesophilic species listed in Table  were retrieved from public databases, including the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/), DOE Joint Genome Institute (JGI, http://genome.jgi.doe.gov/) and Genozyme (http://genome.fungalgenomics.ca/).We inferred a phylogenetic tree to evaluate the evolutionary relationships between the selected species. A super alignment of the selected fungal proteomes was constructed with the Hal pipeline [], allowing for no missing data. Poorly aligned positions and ones with gaps were removed with Gblocks 0.91b []. The following stringent parameters were used: the maximum number of contiguous non-conserved positions was limited to six amino acids, and the minimum length of a block to 15 amino acids. This produced a 106,488-bp-long alignment, which was used for the estimation of the phylogeny. We estimated the best protein evolution model with ProtTest 3.2.1 []. The species tree was generated in PhyML 3.3 []. We calculated the Approximate Bayes (aBayes) branch supports. The analysis was run using the LG model of evolution. The ProtTest estimate of the α-parameter of the γ-distribution of six substitution rate categories (1.019), and the determined proportion of invariable sites (0.067) were used. The phylogeny data, including alignments, are available in the Treebase repository (http://purl.org/phylo/treebase/phylows/study/TB2:S22179). [...] Few annotated fungal genomes of the order Mucorales were present in the databases. Thus, in the present study we annotated the genome for T. indicae-seudeticae (Mucorales: Lichtheimiaceae). The pipeline MAKER was used to annotate the previously unannotated genome of T. indicae-seudeticae (GenBank accession number JSYX01.1). Since the transcriptome of this species was not available, we used the following data as evidence to support the annotation in the pipeline: (i) all proteins contained in the MEROPS protease database (downloaded 15. 7. 2016); (ii) all proteins of the Swissprot database (downloaded 15. 7. 2016); and (iii) the transcriptome of Lichtheimia ramosa (GenBank GCA_000945115.1), a related species that belongs to the same order. We used three gene predictors in the MAKER pipeline: (i) Semi-HMM-based Nucleic Acid Parser (SNAP) [], bootstrap-trained within MAKER; (ii) unsupervised-trained GeneMark-ET [] and (iii) Augustus [] trained for Rhizopus oryzae. [...] We mined the proteomes of all investigated fungi for putative protease sequences using the BLAST against the peptidase database MEROPS [] (http://merops.sanger.ac.uk/). The putative peptidases were classified according to their catalytic site and families by the MEROPS server. An analysis of similarity (ANOSIM) was performed to check for differences in the catalytic type composition between mesophilic and thermophilic species and we applied the Percentage of Similarity analysis (SIMPER) to identify which catalytic type contributes the most to the differences in the enzymatic profile. We conducted the same analyses to evaluate the difference in composition of peptidases families.The percentage ratio of each type of amino acids and the percentage ratio of charged, polar and hydrophobic amino acids was calculated using the PEPSTATS utility included in the EMBOSS suite. We carried out a paired t-test to determine if single amino acid residues or groups contributed to significant differences between thermophilic and mesophilic species in the set of the whole proteins (114,946 and 102,521 proteins, respectively) and the set of peptidases (3340 and 3590 peptidases, respectively); thermotolerant species were not included in the analysis. All analyses were performed in Past v. 2.17c []. All results are presented as the changes from mesophilic to thermophilic species. [...] We chose the A1A AP family because it is the most well characterized peptidase. The dataset was scrutinized for the presence of typical AP hallmarks defined as D[TS]G, Y, XXG, D[TS]G, and XXG (where X is any of the hydrophobic residues AFILMV). Sequences lacking any of the hallmarks were considered as non-functional homologues and excluded from further analysis. We made the first alignment manually by the catalytic motif D[TS]G site as described in Revuelta et al. []. The second alignment was performed by ClustalW [].We performed phylogenetic analysis to identify a cluster of functional sequences in MEGA7 []. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model []. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood values.From the initial tree a cluster with 12 protein sequences was selected. This cluster was composed of proteins from Aspergillus fumigatus, A. niger, Chaetomium globosum, C. thermophilum, Myceliophthora fergusii, M. sepedonium, M. thermophila, Myriococcum thermophilum, Rasamsonia byssochlamydoides, Thermoascus crustaceus, Thielavia terrestris and T. australiensis. […]

Pipeline specifications

Software tools Hal, Gblocks, ProtTest, PhyML, PhyloWS, SNAP, GeneMark-ET, EMBOSS, Clustal W, MEGA
Databases TreeBASE
Applications Phylogenetics, Transcription analysis
Diseases Heat Stroke
Chemicals Amino Acids