Computational protocol: Population expansions shared among coexisting bacterial lineages are revealed by genetic evidence

Similar protocols

Protocol publication

[…] The complete sequences of the 16S rRNA gene and partial sequences of the MLST genes were edited and aligned using BioEdit (). Representative 16S rRNA sequences for the three genera were obtained from GenBank and were included in the alignments as references (see – for accession numbers).We calculated the Pairwise Homoplasy Index (ΦW) using the “PHI test recombination” function implemented in SplitsTree4 () to verify clonality for all lineages as has been reported for Bacillus, Exiguobacterium and Pseudomonas (; ; ). Since all lineages are clonal (), we were able to use concatenated data sets for phylogenetic reconstruction and population genetics analyses.For all genera, maximum likelihood (ML) phylogenies were constructed using (i) 16S rRNA sequences and (ii) concatenated alignments of MLST sequences. The program jModelTest v.2.1.3 was used to determine the best nucleotide substitution model for each alignment (; ). For the 16S rRNA gene, TPM2uf+I+G, TIM1+I+G and GTR+I+G were the substitution models selected for Bacillus, Exiguobacterium and Pseudomonas, respectively. GTR+I+G, TIM2+I+G and GTR+I+G were the models selected for the Bacillus, Exiguobacterium, and Pseudomonas concatenated MLST loci alignments, respectively. Phylogenetic relationships were reconstructed using PhyML v.3.0 (), with the corresponding DNA substitution models, tree improvement was carried out by Subtree Pruning and Regrafting, and branch support was evaluated by 1,000 bootstrap pseudo-replicates. [...] We defined populations as the minimal study unit in our collection. Populations have to be defined considering both geographic and genetic criteria. Given that it is possible to find populations (genetic pools) that are not site-restricted we investigated population structure by genetic differentiation, which is defined as changes in the frequency distribution of haplotype variants among subpopulations (). We estimated pairwise FST values among sampling sites within lineages. With this approach we defined a single population as all isolates from a lineage that exhibited no significant genetic differentiation as measured by pairwise FST estimates. Pairwise FST estimates were obtained for each monophyletic lineage across all sampling sites using Arlequin 3.5 (), with 1,000 iterations. For the Pseudomonas isolates, FST analysis was not performed since all isolates were from the same sampling site.The FST approach implicitly incorporates geographic location and genetic criteria in the investigation of population structure. However, it is possible that genetic structure exists for each lineage when looking at individual genetic variants within each population. Thus, we investigated potential substructure within the groups defined by FST by taking a Bayesian approach implemented in BAPS 6 (). The Bayesian analysis of population structure was performed using the option for linked loci, specifically developed for MLST data (). A maximum number of clusters (K) was set to ten, or equal to the number of individuals if these were fewer than ten. Each analysis was replicated ten times.Once populations were defined, standard measures of nucleotide diversity (π) and the mutation parameter Watterson’s θ were estimated, together with neutrality tests (Tajima’s D, Fu and Li’s F* and D* and Fu’s FS) using DNAsp v4.1 (). Details of these calculations can be found in the . [...] We performed an Extended Bayesian Skyline Plot (EBSP) analysis as implemented in Beast v.1.7.5 (; ). The EBSP infers past population dynamics from a sample of contemporary sequences, taking into account the genealogical stochasticity of the coalescent (; ). Additionally, this method does not depend on a specific a priori demographic model, but infers the number of population size changes directly from the data. As a result, it provides a measure of statistical credibility of the inferred number of population size changes compared to the alternative of constant population size ().For the EBSP analysis, we used all MLST genes, a strict molecular clock, and the same nucleotide substitution models used for phylogenetic reconstruction to estimate changes in population size for each genetically homogeneous population. The substitution rate used for the two Firmicutes genera (Bacillus and Exiguobacterium) was 7 × 10−9 substitutions/per site/generation, obtained from a dated phylogenetic tree (). The substitution rate used for Pseudomonas was 2.8 × 10−8 substitutions/per site/generation, based on an experimental evolution study of P. fluorescens (). All time estimates obtained were expressed in number of generations. Changes in population size were expressed as a function of the product of Ne and the generation time (Ne∗t). All analyses were run for 50–100 million generations, until adequate mixing was achieved. Ten percent of burn-in was removed and the sampling was done every 1,000 chains. The rest of the parameters were set according to the guidelines of . Results were analyzed with Tracer v.1.5 and LogCombiner v.1.7.5 (). […]

Pipeline specifications