Computational protocol: Phylogeny of tremellomycetous yeasts and related dimorphic and filamentous basidiomycetes reconstructed from multiple gene sequence analyses

[…] Sequences were inspected and assembled using the SeqMan program in the Lasergene 7 software package (DNASTAR Inc., Madison) and were then aligned with Clustal X 1.83 (). Spliceosomal intron regions were inferred from the insertions with canonical splice sites (GT-AG, GC-AG, AT-AC) () in the nucleotide sequence alignments between our data and reference cDNA sequences from GenBank. Exon sequences of the protein-encoding genes RPB1, RPB2, TEF1 and CYTB were manually aligned using MEGA 5 (). Positions deemed ambiguous to align were excluded manually. Thereafter, multiple sequence alignments for ITS, D1/D2, SSU, RPB1, RPB2, TEF1, and CYTB were concatenated as a combined file.Maximum likelihood (ML), neighbour-joining (NJ), and Bayesian analyses were conducted for separate and combined nucleotide data sets using RAxML v8.1.X (), MEGA 5.0 () and MrBayes 3.2.1 (), respectively. ML analysis was implemented with the novel fast bootstrap algorithm with 100 replicates and a subsequent search for the best maximum-likelihood tree in conjunction with the GTRGAMMAI model approximation (). NJ analysis was performed on the evolutionary distance data calculated from Kimura's two-parameter model (). Bootstrap analyses () were performed from 1 000 random re-samplings in both ML and NJ analyses. A bootstrap proportion (BP) support above 70 % obtained from the ML and NJ analyses was considered as significant ().Bayesian analysis was implemented using heterogeneous models to the data set with seven unlinked partitions, one for each gene. The best-fit evolution model of each gene fragment in the data set was determined using the Bayesian Information Criterion (BIC) in jModeltest (). The ITS, D1D2, and SSU rDNA gene sequences were fitted to TPM3uf+G, TIM3+G, and TIM2+T+G models, respectively. The protein-coding genes RPB1 and CYTB both used the GTR+I+G model; whereas RPB2 and TEF1 used the TPM3uf+I+G and TPM1uf+G models, respectively. Six to fifty million generations were run with four Markov chains (three heated and one cold), sampling every 500 generations. The average standard deviation of split frequencies, below 0.01, was examined to identify the convergence of the two independent runs. Clades with posterior probabilities (PP) above 0.95 were considered as significantly supported (). […]

