Computational protocol: Evidence of diversity and recombination in Arsenophonus symbionts of the Bemisia tabaci species complex

Similar protocols

Protocol publication

[…] Multiple sequences were aligned using MUSCLE [] algorithm implemented in CLC DNA Workbench 6.0 (CLC Bio). Phylogenetic analyses were performed using maximum-likelihood (ML) and Bayesian inferences for each locus separately and for the concatenated data set.JModelTest v.0.1.1 was used to carry out statistical selection of best-fit models of nucleotide substitution [] using the Akaike Information Criterion (AIC). A corrected version of the AIC (AICc) was used for each data set because the sample size (n) was small relative to the number of parameters (n/K < 40). This approach suggested the following models: HKY for fbaA, GTR for ftsK, HKY+I for yaeT and GTR+I for the concatenated data set. Under the selected models, the parameters were optimized and ML analyses were performed with Phyml v.3.0 []. The robustness of nodes was assessed with 100 bootstrap replicates for each data set.Bayesian analyses were performed as implemented in MrBayes v.3.1.2 []. According to the BIC (Bayesian information criterion) estimated with jModelTest, the selected models were the same as for ML inferences. For the concatenated data set, the same models were used for each gene partition. Analyses were initiated from random starting trees. Two separate Markov chain Monte Carlo (MCMC) runs, each composed of four chains, were run for 5 million generations with a “stoprule” option to end the run before the fixed number of generations when the convergence diagnostic falls below 0.01. Thus, the number of generations was 3,000,000 for FbaA, 600,000 for FtsK, 2, 100,000 for YaeT and 1,000,000 for the concatenated data set. A burn-in of 25% of the generations sampled was discarded and posterior probabilities were computed from the remaining trees. Runs of each analysis performed converged with PSRF values at 1.In addition, Arsenophonus strains identified in the present study were used to infer phylogeny on a larger scale with the Arsenophonus sequences from various insect species obtained from Duron et al. []. The GTR+G model was used for both methods (ML and Bayesian inferences) and the number of generations was 360,000 for the Bayesian analysis. [...] Identical DNA sequences at a given locus for different strains were assigned the same arbitrary allele number (i.e. each allele has a unique identifier). Each unique allelic combination corresponded to a haplotype.Genetic diversity was assessed using several functions from the DnaSP package [] by calculating the average number of pairwise nucleotide differences per site among the sequences (π), the total number of mutations (η), the number of polymorphic sites (S) and the haplotype diversity (Hd). The software Arlequin v.3.01 [] was used to test the putative occurrence of geographical or species structure for the different population groups by an AMOVA (analysis of molecular variance). The analyses partitioning the observed nucleotide diversity were performed between and within sampling sites (countries, localities) or species (B. tabaci species, T. vaporariorum and B. afer). For each analysis, genetic variation was partitioned into the three following levels: between groups (FCT), between populations within groups (FSC) and within populations (FST). Significance was tested by 10,000 permutations as described by Excoffier et al. []. […]

Pipeline specifications