Computational protocol: Phylogenomics of Prokaryotic Ribosomal Proteins

[…] The set of r-proteins used in this study was essentially the same as in . Ribosomal protein S1p was excluded from the list because of its varied domain architecture and the ubiquity of S1-like domains in a wide variety of RNA-associated proteins unrelated to the ribosome . NCBI COG and arCOG databases were used as sources of initial sets of 56 bacterial and 68 archaeal r-proteins respectively (). Initial sets of three novel archaeal r-proteins identified by Marquez et al were retrieved using PSI-BLAST searches against the nr database. NCBI protein cluster PRK10057 was used as the initial set for ribosomal protein S22; the initial set for ribosomal protein S31e (a.k.a. THX peptide ) was created by PSI-BLAST starting from S31 protein of Thermus thermophilus HB8 (TTHA1396). Initial sets were aligned using MUSCLE and used as position-specific scoring matrices (PSSM) in PSI-BLAST searches. [...] All sets of r-proteins were aligned using MUSCLE program . Alignments for 50 bacterial r-proteins (all but S21, S22, L25, L30, S31, and L7ae), filtered to contain positions with less than 50% of gap characters and concatenated producing a 6,127- position alignment. A ML tree was constructed using FastTree program with WAG evolutionary model and discrete gamma model with 20 rate categories). A 7,843-position concatenated alignment of 56 archaeal r-proteins that were present in all of 87 archaeal genomes was used to reconstruct the trees using the same procedure. A phylogenetic tree for a 4,226-position concatenated alignment of 32 universal r-proteins (, except L30p and L7ae) from all archaea and selected bacteria and eukaryotic species () was constructed in the same manner.Additionally, the optimal amino acid evolution model (LG+G) was selected for the alignment of 32 universal r-proteins using the ProtTest program . This model was used for phylogenetic reconstructions with taxon-sampled alignments using the RAxML program (see for details). [...] Mapping of gene gains and losses to the phylogenetic trees was produced using the Dollo parsimony analysis implemented in DOLLOP program of the PHYLIP package . […]

