Computational protocol: Stability along with Extreme Variability in Core Genome Evolution

Similar protocols

Protocol publication

[…] Reciprocal BlastP () searches (e-value threshold 0.01, no composition-based statistics adjustment) were performed between members of each pair or triplet of genomes. For the pairs, bidirectional best hits (BBHs) were recorded as a proxy for orthologs (; ); in the genome triplets, strict BBH triangles formed triplets of orthologs.Alignments of putative ortholog pairs or triplets were produced using the MUSCLE alignment program (). Distances between sequences were calculated using the FastTree program () that produces log-corrected distances calculated with the BLOSUM45 amino acid similarity matrix. If the sequences of orthologs were exactly identical, they were assigned a distance of 0.5 divided by alignment length (). The same software (MUSCLE and FastTree) was used to reconstruct approximate maximum likelihood (ML) phylogenetic trees from multiple alignments of Clusters of Orthologous Group (COG) representatives that were used for Xenologous Gene Displacement (XGD) detection. The trees in the Newick format are available at should be noted that with so closely related species, the quality of pairwise alignments and therefore the accuracy of distance estimate is not expected to represent a problem. The most distant pair of sequences in the entire P1 set (YP_004138486 vs. YP_003007995, COG0221) has three indels within 175 amino acid protein sequence alignment and has a reported BlastP e-value of 4 × 10−25 at 28% sequence identity. However, to test the robustness of the results to the potential inaccuracy of sequence alignments, we produced a variant of the P1 data set distances that were estimated only for alignments with ≥40% identity (set P1’). To test the robustness of the results to the distance calculation method, we produced a variant of the P1 data set distances that were estimated using the Protdist program of the PHYLIP package () with the Jones, Taylor, and Thorton evolutionary model and gamma-distributed site evolution rates with shape parameter of 1 (set P1”). […]

Pipeline specifications