Computational protocol: Odorant Binding Proteins of the Red Imported Fire Ant, Solenopsis invicta: An Example of the Problems Facing the Analysis of Widely Divergent Proteins

Similar protocols

Protocol publication

[…] Expecting the generally divergent nature of OBPs sequences (∼20% amino acid identity over all sequences) to make the sequence alignment problematic , we used several multiple sequence alignment (MSA) methods to evaluate potential different outcomes of using six alignment approaches (), which differ greatly in popularity and general approach to the MSA problem . We used default parameters for all alignment estimates. Nucleotide (codon) alignments were based on the amino acid alignments.In addition, we used BAli-Phy 2.0.2 to simultaneously estimate the alignment and phylogeny of the each species' OBPs in a Bayesian framework . Since BAli-Phy is computationally intensive and generally considered to be too slow to be efficiently used with more than a dozen sequences, we conducted these analyses for both the ant and bee datasets independently. Additionally, we removed six bee OBPs from the well-supported C-minus expansion to reduce computational burden. We used default parameters for each run of 100,000 generations. Stationarity of the searches was verified using Tracer 1.5 . 9999 samples were removed in the burn-in. The lowest effective sample size (ESS) for any parameter estimate was 802.3378, suggesting that we had run the analyses sufficiently long to enable meaningful estimates from the posterior sampling.The alignments were compared using a range of ad hoc heuristic criteria. First, we visually compared alignments for congruence in their ability to align sections of the alignments (especially the inner core) using AltAVisT and the overall sequence identity calculated from each alignment. We then tested for sequence saturation using both the Steel (for amino acids; ) and the Xia (for nucleotides ) methods using DAMBE . Finally we compared their ability to capture phylogenetic signal relative to the other alignment methods (using ML trees; see below). To this end, we compared log-likelihoods, tree length (measured by parsimony steps of the phylogeny and ML tree size), and the average of aLRT branch support as well as the Robinson-Foulds tree distance to the ant and bee MAP trees using the TreeDist program in the PHYLIP 3.69 package . [...] We used the ProtTest server to estimate the best-fitting model of amino acid substitution for each alignment using the Bayesian information criterion (BIC ). Tree topologies were optimized starting from an initial BioNJ tree. Phylogenetic hypotheses under the maximum likelihood criterion were derived from the amino acid alignments using PhyML3 . We implemented the model consistently chosen by the BIC (LG ) while estimating the proportion of invariable sites (+I) and gamma shape parameter (+Γ) with 4 rate categories. Tree searches started from five random starting trees and used SPR and NNI to optimize topologies. Branch lengths were optimized and branch support was estimated using the SH-like aLRT . We also employed MrBayes 3.1.2 to compare phylogenetic hypotheses derived from the amino acid and nucleotide datasets. Due to computational burden of the Bayesian analyses, we only performed these on the two best alignments (MUSCLE and PRANK). For each alignment, we performed two searches using different models of sequence evolution. For the amino acid dataset we employed model averaging to incorporate model selection in the Markov Chain Monte Carlo (MCMC) search. For the nucleotide codon alignment we implemented the GTR+I+Γ model. Four chains were run for 5 million generations (one cold and three heated; temperature = 0.02–0.03). Samples from the MCMC were taken every 1000th generation. All other parameters were left at program defaults. Convergence was assessed by measuring average standard deviations of split frequencies, potential scale reduction factor (PSRF) values, plateauing of log-likelihoods values, and ESS values >100. [...] We conducted analyses of positive selection using the codeml program in the PAML 4.3 package . Since codeml requires a fully resolved tree, we used the ML trees of the PRANK, MUSCLE, CLUSTAL, and BAli-Phy alignments as input. These represent the two “best”, the longest and shortest alignments. We estimated branch lengths under the F3×4 codon model on the respective topologies. We conducted site-specific tests of selection , , , . We were also specifically interested in whether positive selection had influenced the divergence of the ant-specific expansion. Hence, we performed branch-specific tests of selection , on the branch leading to this clade. However, under certain circumstances the branch-specific test of selection can lack power and so we also used the branch-site test of selection , implementing the Bayes empirical Bayes (BEB ) method to identify sites under selection. To ensure that the analyses had converged properly, we repeated each analysis three times from different starting parameter options and under different codon models. […]

Pipeline specifications

Software tools BAli-Phy, AltAVist, DAMBE, PHYLIP, ProtTest, PhyML, MrBayes, MUSCLE, PAML
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Solenopsis invicta, Apis mellifera