Computational protocol: Multiple losses of photosynthesis and convergent reductive genome evolution in the colourless green algae Prototheca

[…] Annotated nuclear genes of C. variabilis, A. protothecoides, Helicosporidium sp., P. cutis, and P. stagnora were classified into known gene families using TreeFam 9 with an E-value cut-off of 1E-5. Plastid-related proteins were identified using PRIAM (March 2015) with an E-value cut-off of 1E-10. [...] We performed the phylogenetic analysis using 38 highly conserved plastid encoding proteins (Supplemental Table ), equivalent to 6,467 amino acids, representing 42 taxa (Supplemental Table ). Organisms belonging to the core Trebouxiophyceae were used as an outgroup. The sequences were aligned using MAFFT 7.164b with the L-INS-i option, and poorly aligned regions were manually eliminated using MEGA 6.0. Model test was carried out by IQ-TREE multicore v.1.3.2 and maximum likelihood (ML) analyses were performed with the options LG + GAMMA + I + F using RAxML v.8.1.21. Statistical support was evaluated with the nonparametric bootstrap test using 200 replications. Bayesian analyses were performed using MrBayes v3.2.6 with the same substitutional model. Bayesian inference consisted of 2,000,000 generations with sampling at every 1,000 generations using the four Metropolis-coupled Markov chain Monte Carlo (MCMCMC) simulations. Two separate runs were performed, and the convergence was assessed by the average standard deviation of split frequencies (ASDSF) falling below 0.01. Bayesian posterior probabilities (BPP) were calculated from the majority rule consensus of the trees sampled after the initial 500 burn-in trees.We also performed phylogenetic analyses using the nucleus-encoded proteins of 7 taxa (P. cutis, P. stagnora, P. wickerhamii, Helicosporidium sp., A. protothecoides, C. variabilis, and C. subellipsoidea). Orthologous sequences among these taxa were searched using the reciprocal best-hit analyses with the cut-off: similarity >70% and HSP coverage >50%. A total of 58 proteins, which were shared by at least six taxa, were used for the analyses (Supplemental Table ). ML analyses were performed using the same method with the plastid-encoded proteins. [...] The dN/dS ratios of the plastid-encoded ATP synthase genes and chlorophyll b reductase genes were calculated for P. cutis, P. wickerhamii, A. protothecoides, C. variabilis, and C. subellipsoidea. Amino acid sequences were aligned using MAFFT 7.164b with the L-INS-i option. The aligned sequences were converted to nucleotide sequences using PAL2NAL v.14. Pairwise dN/dS ratios among C. variabilis and the others were calculated using the codeml program of the PAML package v.4.8. […]

