Computational protocol: Identification of horizontally transferred genes in the genus Colletotrichum reveals a steady tempo of bacterial to fungal gene transfer

Similar protocols

Protocol publication

[…] The program Blast2GO 2.8 [] was used to annotate the HGT candidates with Gene Ontology (GO) terms. The annotations were verified with InterProScan 4 [] and the putative biochemical functions of candidates were predicted with BRENDA (Release 2012.02) [], KEGG (update 13-12-2012) [] and MetaCyc 18.5 []. Additionally, MEROPS 9.11 [] and CAZy [] databases were explored to further annotate the functions of proteases and carbohydrate active enzymes. To predict the cellular localization of candidates WOLF PSORT v0.2 [] and SignalP 4.1 server [] were used.BLASTP (v2.2.29+) searches of the HGT candidates were performed against PHI-base V3.4 [], Virulence Factors Database (VFDV) (Release 3) [] and Database of Fungal Virulence Factors (DFVF) [] to detect proteins implied in pathogenicity.The expression data were extracted from O’Connell et al. [] for C. graminicola and C. higginsianum. The data is presented in the Additional file : Table S2. [...] To estimate the phylogenetic relationship of the species involved in HGT events, all fungal species found in BLAST searches of the HGT candidates were used (54 species in total). The complete proteome of these species was obtained from the UniProt (, GenBank (, Joint genome institute ( and Broad Institute ( databases (all databases downloaded 03-05-2013). To reconstruct the species tree, the amino acids inferred from six nuclear genes were chosen. The proteins selected (FG533, FG570, FG832, MS277, MS413 and MS456) from FunyBase [] demonstrate to be good phylogenetic markers for fungi species trees reconstruction [] and for that reason these were selected as a query to perform the BLAST searches in the proteomes of the 54 fungal species. Homology was verified making a multiple sequence alignment and tree reconstruction with MAFFT and PhyML respectively to make clear the orthology relationship between taxa. The topological congruence among the protein trees was checked before accepting the protein in the analysis. The six proteins were concatenated for each taxa. The concatenated proteins were aligned with three different programs, MAFFT, MUSCLE [] and CLUSTALW [] to evaluate the differences in the phylogenetic reconstruction when different software is used. When the alignment was chosen TrimAl [], GBLOCKS [] and Guidance [] were used to edit it. A tree was reconstructed with each edited and unedited alignment with PhyML with 100 bootstrap repetitions. The best tree was selected by the alignment that produced the tree with the highest bootstrap values. PartitionFinder [] was used to detect accurate models for the final alignment, using each protein as a partition. The models predicted by PartitionFinder were used in RaxML to calculate the maximum likelihood tree starting with 100 random trees. Finally, a non-parametric bootstrap analysis with 100 replications was performed and the results were summarized in the maximum likelihood tree. [...] To estimate the putative age of the transferred genes a fossil calibrated molecular clock analysis was performed. To avoid problems of convergence in the calculations of the calibrated tree, we selected 15 species from the 54 taxa used in the species tree. The same matrix of 6 concatenated proteins of the species tree was used. The analysis was performed with the BEAST v1.7.5 software package []. For each partition (each protein) the LG + I + G model was used. To allow uncorrelated rates of evolution across the tree we use a lognormal relaxed clock model, implementing a Yule process as a tree prior. We use a maximum likelihood tree estimated in PhyML under the LG + I + G model as starting tree. To calibrate the tree we use the estimation of the Paleopyrenomycites devonicus fossil age of 400 million years ago (mya) as the lower bound for the Pezizomycotina crown []. A lognormal distribution with a mean of 460 (estimated from the results of Lucking et al. []), standard deviation of 1 and offset of 400 was used as a prior for the time to the most recent common ancestor (TMRCA) of the Pezizomycotina. Two independent BEAST runs of 15 million generations each were performed. Data was sampled every 1500 generations. The convergence of two runs was visualized with TRACER v1.5 [] and the Log files and tree files were combined with LogCombiner v1.7.5 [] dismissing a percentage of the sample in agree with TRACER plots of each run (23.3% for each one). With the remaining trees a maximum clade probability tree was calculated using TreeAnnotator v1.7.5 []. The resultant tree was visualized with the FigTree software ( […]

Pipeline specifications