Computational protocol: A depauperate immune repertoire precedes evolution of sociality in bees

Similar protocols

Protocol publication

[…] The genomes of haploid males from a single colony of B. terrestris and of B. impatiens were sequenced by the Bumblebee Genome Consortium and the details of the sequencing, assembly, and automated annotation can be found in []. Using OrthoDB [,] orthologous groups, we identified orthologs from the two bumblebees, as well as from Apis florea, M. rotunda, and N. vitripennis, Tribolium castaneum, of previously characterized immune genes from D. melanogaster, A. gambiae, and A. mellifera that comprise 27 immune-related gene families or pathways. To complement the orthology searches, we searched for homologs of known immune proteins from the two bumblebees using blastp [,] against the official gene sets (NCBI RefSeqs). To confirm the absence of any proteins that appeared to be missing, we searched the genome assemblies and Short Reads Archive using tblastn. [...] We retrieved protein sequences of selected gene families from OrthoDB [,] and aligned them using ClustalW [] and adjusted the alignments manually or with Gblocks [] before tree-building using MrBayes [] with the mixed model. We ran MrBayes for as long as was necessary (typically for 20,000 to 400,000 generations) to reduce the average deviation of split frequencies below 0.01 or until the split frequency approached 0.01 but did not improve further. We discarded the initial 25% of the trees as a burn-in. [...] We extracted orthologous groups of immune-related genes from OrthoDB6 [,]. From the 130 orthologous groups with sequences from B. terrestris, B. impatiens, A. mellifera and A. florea we extracted 148 multiple sequence alignments containing exactly one sequence from each species. We use these 148 alignments for comparisons between the Bombus and Apis clades. From the 122 orthologous groups that contain M. rotundata sequences we further extracted 139 alignments that also contain a M. rotundata ortholog, which we use as an outgroup to compare social with solitary (non-social) bees. In six of the alignments (abaecin, basket, cactus, defensin, kayak and tak1) one or more orthologs were not present in OrthoDB6 and had to be retrieved from an alternative source (that is, NCBI). Protein sequences were aligned independently for the four-taxa (Bombus and Apis) or five-taxa trees (with Megachile) with ProGraphMSA [] and trimmed using Gblocks with the stringent settings as described in []. Where orthologous groups contained multiple sequences for some species the most closely related sets of sequences were aligned. In the 12 orthologous groups that contained more than one sequence for each species we extracted the maximum number of alignments, such that each alignment contains only one sequence from each species. We retrieved cDNA sequences for the alignments from the official gene sets (A. mellifera v.4.5, A. florea v.1.0, B. impatiens v.2.0, B. terrestris v.1.0, M. rotundata v.1.0) using a custom written Python script.We used likelihood-based codon models implemented in the PAML package [] to analyze differences in the rate of evolution and to test for signals of selection. We tested hypotheses by using likelihood ratio tests to select the best fitting model among pairs of nested models that differ only in their representation of ω, the ratio of non-synonymous to synonymous substitutions (ω = dN/dS). We make the assumption that ω > 1 indicates positive selection, while ω < 1 and ω = 1 indicates negative and neutral selection.The average rate of evolution was determined using the M0 [] model, which assumes a constant ω across all sites and branches. The average ω is not a good indicator for the presence of positive selection, since functional and structural constraints ensure that most sites in functional genes are conserved []. Hence, we used the M7 and M8 models to test for the presence of positively selected sites. []. Both models allow ω to vary from site-to-site according to a Beta distribution, but the M8 model additionally allows some sites to evolve with ω > 1, to account for sites under positive selection.In order to detect episodes of positive selection on the connecting branches between clades we used the branch-site model [,]. Some branches are assigned a priori to the foreground, where some sites are allowed to evolve with ω > 1, while all sites on background branches are constrained to ω ≤ 1. The branch-site model is compared to a null model where there is no difference between foreground and background branches. We used Clade model D [] to test for more general differences between clades. This model allows ω to differ between clades in some sites. It is compared to a null model where there are no differences in ω between clades.To ensure that the PAML optimization did not get stuck in local optima we used six different initial estimates for ω in all analyses and initialized branch lengths to values calculated with PhyML []. We corrected for multiple testing by controlling the false discovery rate using the method of Benjamini and Hochberg []. To calculate the posterior probabilities of sites being under positive selection in the M8 and Branch-site models we used the Bayes Empirical Bayes (BEB) approach implemented in PAML [].We repeated the analyses using Probcons [] for aligning sequences. However, we only report the results from alignments produced by ProGraphMSA, as these alignments give more conservative estimates and hence a smaller chance of falsely reporting positive selection. Similarly, we do not report results from using Gblocks with the relaxed settings, as described in [], or no trimming at all. These results are available from the authors. […]

Pipeline specifications

Software tools Clustal W, Gblocks, MrBayes, PAML, PhyML, ProbCons
Databases OrthoDB
Applications Phylogenetics, Amino acid sequence alignment
Organisms Apis mellifera, Bombus impatiens, Bombus terrestris