Computational protocol: Early evolution of the biotin-dependent carboxylase family

Similar protocols

Protocol publication

[…] For each domain of life for which sequence data was available, we retrieved one representative of each of the different biotin-dependent carboxylases and BPL/BirA enzymes from the KEGG database (http://www.genome.jp/kegg) to be used as seeds for further similarity searches. Since some biotin-dependent carboxylases were absent from this database, we completed with archaeal ACC/PCC [], bacterial GCC [], one bacterial UCA [] and one proteobacterial ODC [] sequence obtained from GenBank (http://www.ncbi.nlm.nih.gov/Genbank). Similarity searches with BLASTp [] were done with the well-characterized protein domains contained in these representative sequences as queries against their respective domain of life. In cases where a particular enzyme was missing in KEGG for one domain of life, we used sequences from the other domains as queries. Similarity searches in archaea and bacteria were done against a list of completely sequenced genomes available in GenBank (298 bacteria and 55 archaea, additional file ). In eukaryotes, all searches were done against the complete non-redundant (nr) eukaryote-annotated GenBank database.Sequences for each protein domain found by these searches in the three domains of life were aligned with Muscle 3.6 [] or MAFFT v6.814c-b []. Alignments were edited with the program ED of the MUST package [] and redundant and partial sequences were removed at this step. Ambiguously aligned regions were removed prior to phylogenetic analyses using the NET program from the MUST package. Alignments are available in Nexus format as additional files , , , and . Preliminary secondary structure searches on MCC, GCC, PYC, XCC and accE (see results) were carried out using APSSP (Advanced Protein Secondary Structure Prediction Server, http://imtech.res.in/raghava/apssp/) and GOR4 []. [...] Preliminary trees based on the complete sequence dataset for each enzyme were constructed by the approximately maximum likelihood approach with FastTree 2.1.3 [] in order to classify sequences in functional classes with respect to well-characterized proteins (see additional file ). Neighbor joining trees (NJ) [] using the MUST package [] were also reconstructed to select representative sequences with which carrying out more detailed maximum likelihood (ML) and Bayesian inference (BI) phylogenetic analyses. ML tree reconstructions were done with the program TREEFINDER [] with the LG + Γ model [] and 4 rate categories, which was selected as the best-fit model for all our datasets by the model selection tool implemented in TREEFINDER []. Node support was assessed by 1,000 bootstrap replicates with the same model. BI trees were reconstructed using the program MrBayes v. 3.0b4 [] with a mixed substitution model and a Γ distribution of substitution rates with 4 categories. Searches were run with 4 chains of 1,000,000 generations for which the first 2,500 generations were discarded as "burn in", trees being sampled every 100 generations. Stabilization of the chain parameters was verified using the program TRACER []. Approximately unbiased tests [] were carried out using the test tool implemented in TREEFINDER []. […]

Pipeline specifications

Software tools BLASTP, MUSCLE, MAFFT, APSSP2, FastTree, MrBayes
Databases KEGG
Applications Phylogenetics, Protein structure analysis, Nucleotide sequence alignment
Chemicals Acetyl Coenzyme A, Acyl Coenzyme A, Biotin, Coenzyme A, Urea, Pyruvic Acid