Computational protocol: Exploring Components of the CO2-Concentrating Mechanism in Alkaliphilic Cyanobacteria Through Genome-Based Analysis

Protocol publication

[…] The bidirectional sequence alignment approach, namely reciprocal BLASTP , was employed to identify proteins of 12 studied species, which are homologous to the reference proteins of Synechocystis sp. PCC 6803. To avoid under- and over-estimation of sequence similarity of these related species, the candidate orthologous proteins were determined based on BLAST statistics with the E-value threshold (≤ 10− 6) , the identity (≥ 30) , and coverage percentage (≥ 60) . Only protein sequences with the BLASTP scores above the set critical values were further analyzed for the conserved domain using the Pfam database 27.0, provided by the Sanger Centre, UK ( . The default E-value cut-off of 1.0 was applied for this study . The GUIDANCE web-server tool ( was used to evaluate a confidence score of multiple sequence alignments. Additionally, the genomic features were visualized by GView . [...] A phylogenetic tree of the 12 selected strains and reference cyanobacteria was constructed based on Rubisco large subunit (RbcL) amino acid sequences, which were used to infer the protein function and classification among the strains. Other phylogenetic trees based on protein sequences of CmpABCD of the HCO3− transporter BCT1 and sequences of NrtABCD of the nitrite/nitrate transporter were constructed to confirm the identity between the proteins. The reference species were selected according to types of carboxysomes (α- and β-classes), the existence of both CmpABCD and NrtABCD transporters in genomes, or their habitats. These strains included freshwater (Anabaena sp. PCC 7120, Anabaena variabilis ATCC 29413, Cyanothece sp. PCC 8801, Cyanothece sp. PCC 8802, Nostoc punctiforme ATCC 29133, Synechococcus sp. PCC 7942, and Synechocystis sp. PCC 6803) and marine (Lyngbya sp. PCC 8106, Trichodesmium erythraeum IMS101, Synechococcus sp. PCC 7002, Synechococcus sp. CC9605, Synechococcus sp. CC9902, Prochlorococcus marinus AS9601, NATL1A, and NATL2A, and Prochlorococcus marinus MIT 9211, 9215, 9301, 9303, 9312, 9313, and 9515) cyanobacteria. Their corresponding amino acid sequences were retrieved from the public databases, including the CyanoBase ( and the GenBank ( databases. A phylogenetic tree was created by performing multiple sequence alignment with MUSCLE , , and then constructed based on the Maximum Likelihood through the MEGA 6.0 software . The reliability of the trees/branches was estimated via the bootstrap method , with 3000 replications. […]

Pipeline specifications

Software tools BLASTP, GView, MUSCLE
Databases Pfam CyanoBase
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment
Organisms Cyanobacterium stanieri
Chemicals Adenosine Triphosphate, Carbon Dioxide, Sodium