Computational protocol: Origin and Evolution of Water Oxidation before the Last Common Ancestor of the Cyanobacteria

Similar protocols

Protocol publication

[…] All nonredundant PsbA (D1) and PsbD (D2) amino acid sequences were retrieved on January 2, 2014 using Position-Specific Iterated (PSI)-BLAST. To retrieve D1 and D2 subunits deposited in the RefSeq database, the D1 (accession number: YP_001864696, NpunF1022) protein from Nostoc punctiforme PCC 73102 was used as a search query. The search was restricted to the phylum Cyanobacteria, excluding environmental samples. Partial sequences were removed from the data set. In total, 490 sequences were retrieved, 360 D1 and 130 D2 subunits. 20 D1 sequences representing the diversity of photosynthetic eukaryotes were added to the data set. All the sequences used in this study are available in supplementary text S1 (Supplementary Material online).Sequences were aligned using the Clustal Omega algorithm, with a maximum of 104 guide tree iterations and 107 Hidden Markov Model iterations (). All alignments are available on request. To confirm that the alignment was correct, the 3D structures of the D1 and D2 proteins (PDB ID: 3WU2) were overlapped using the CEalign () plug-in for PyMOL (Molecular Graphics System, Version Schrödinger, LLC) and structural homologous positions were cross-checked with the alignment. Maximum likelihood phylogenies were constructed using PhyML 3.1 (). Phylogenies were constructed using the LG model of amino acid substitution. The equilibrium frequencies and the proportion of invariant sites were set to be estimated by the software. Four gamma rate categories were used with the gamma shape parameter left to be calculated by the program. The nearest neighbor interchange method was used for tree improvement. Branch support was calculated with the approximate likelihood ratio test option (). Branch support of 0.7 (70%) was considered to be informative. Trees were plotted using Dendroscope 3.2.8 ().Parsimony, distance, and Bayesian analyses were performed in a subset of sequences and the trees are shown in the supplementary information (supplementary fig. S3, Supplementary Material online). Parsimony was calculated by randomizing the sequences ten times and with 1,000 bootstrap replicates. BioNJ was computed with 105 bootstrap replicates and with an observed distribution. Both Parsimony and BioNJ trees were done with Seaview 4.4.3 (). Bayesian analysis was performed with Phylobayes 3.3 () using the CAT mixture model to account for compositional heterogeneity across sites and applying relative exchange rates and four gamma rate categories (CAT + GTR + Γ). Four independent chains were run until convergence (26,000 cycles, maxdiff of 0.089). The first 5,200 trees were discarded as “burn-in,” and the remaining trees from each chain were used to test for convergence and compute the majority rule consensus tree.Structural homology models were generated using the SWISS-MODEL online service (, last accessed February 18, 2015; ). To produce models for the atypical D1 forms, we selected the atypical sequence from G. kilaueensis JS-1 (WP_023174186.1), the Group 1 and Group 2 D1 sequences from Croococcidiopsis thermalis PCC 7203 (WP_015153111.1 and WP_015152761.1, respectively) and Synechococcus sp. PCC 7335 (WP_006456314.1 and WP_006458236.1). These were modeled to the D1 subunit from the crystal structure of Photosystem II from T. vulcanus at 1.9 Å (). The amino acid numbering in the crystal structures were used throughout the text for clarity. Molecular models were visualized using PyMOL. […]

Pipeline specifications

Software tools BLASTP, Clustal Omega, PyMOL, PhyML, Dendroscope, SeaView, PhyloBayes, SWISS-MODEL
Databases ExPASy
Applications Phylogenetics, Protein structure analysis
Organisms Gloeobacter kilaueensis