Computational protocol: An extended phylogenetic analysis reveals ancient origin of "non-green" phosphoribulokinase genes from two lineages of "green" secondary photosynthetic eukaryotes: Euglenophyta and Chlorarachniophyta

Similar protocols

Protocol publication

[…] Most Class II PRK sequences were retrieved from the National Center for Biotechnology Information (NCBI) and Joint Genome Institute (JGI) In this analysis, besides five new PRK sequences, one brown alga (Ectocarpus siliculosus), seven chlorophytes, and several other available sequences were added to OTUs used previously [,]. Sequences of PRK genes from two charophycean algae, Closterium peracerosum-strigosum-littorale complex (Closterium psl complex) and Chara braunii, were obtained from unpublished assembled EST data (Nishiyama pers. comm.). The aa sequences of PRK from 42 eukaryotic ingroup and 14 cyanobacterial outgroup OTUs (including five genes sequenced in this study; Additional file ) were aligned using SeaView [], and ambiguous sites were removed from the alignment to produce a data matrix of 327 aa from 60 OTUs (available from TreeBase:; study ID: s11802) (Additional file ). All of the PRK nucleotide sequences used in the present study cover more than 300 aa within the 327 aa alignment except for EST database-retrieved sequences from the streptophyte Artemisia annua (230 aa), Beta vulgaris (262 aa), the glaucophyte Cyanophora paradoxa (153 aa), and the dinoflagellate Amphidinium carterae (292 aa). The following phylogenetic analyses were carried out, after excluding four dinoflagellate sequences that exhibit long branches and cause low phylogenetic resolution (Additional file ).Bayesian inference (BI) was conducted using MrBayes (ver. 3.1.2; []) with the WAG+I+Г4 model. BI consisted of two parallel runs with each of four Markov chain Monte Carlo (MCMC) incrementally heated chains and 1,000,000 generations, with sampling every 100 generations. The first 25% of the generations were discarded as burn-in, and the remaining trees were used to calculate a 50% majority-rule consensus tree and determine the posterior probabilities (PP) of the individual branches. The average standard deviation of split frequencies of the two MCMC iteration runs was below 0.01 for each analysis, indicating convergence. In addition, 1000 replicates of bootstrap analyses using the maximum likelihood (ML) method were performed using both RAxML (ver. 7.0.3; []) and PhyML (ver.3.0; []) with the WAG+I+Г4 model. Maximum parsimony (MP) analysis was also run with PAUP 4.0b10 [] with the nearest-neighbor-interchange search method to produce bootstrap values (BV) based on 1000 replicates.In addition, we carried out two approximate unbiased tests (AU test) [] to examine the phylogenetic positions of the two monophyletic groups of euglenophytes and chlorarachniophytes. We used two series of the phylogenetic trees of PRK sequences, where topologies of all the OTUs excluding either of the euglenophytes or chlorarachniophytes were fixed, and the alignment (327 aa) as input data. All possible topologies were generated by re-grafting the branch of euglenophytes or chlorarachniophytes using the in-house ruby script. The pools of topologies were analyzed with the AU test using the site-wise log-likelihood values were calculated using PhyML (with WAG model+F+I+Г4) and used for AU test conducted by Consel (ver. 0.1 k; []).Analyses of sedoheptulose-bisphosphatase (SBP) genes were also carried out based on 275 aa from 37 OTUs (available from TreeBase:; study ID: s11802) (Additional file ) representing a wide range of eukaryotic organisms (including two chlorarachniophyte sequences) (Additional file ) using the same phylogenetic methods as for the present PRK genes described above.Programs for BI, ML and AU test were executed on a supercomputer (Human Genome Center, University of Tokyo, Japan). […]

Pipeline specifications

Software tools SeaView, MrBayes, RAxML, PhyML, PAUP*, CONSEL
Databases TreeBASE
Application Phylogenetics
Organisms Euglena gracilis, Bigelowiella natans