Computational protocol: Mitochondrial genome evolution in fire ants (Hymenoptera: Formicidae)

Similar protocols

Protocol publication

[…] Mitogenomes were annotated using the DOGMA webserver [], which uses BLASTX against a custom database to identify protein coding genes. We verified all annotations made with DOGMA: coding regions were checked against a S. invicta EST database [] and tRNAs were validated using ARWEN 1.2 [] and tRNAscan-SE 1.21 [] since DOGMA only uses COVE [] to identify tRNAs. Generally, tRNAscan-SE has very low false positive rates and thus rarely mispredicts tRNAs (COVE scores ≥ 20 are usually considered reliable []), whereas ARWEN has a low false negative rate and usually identifies all tRNAs []. Generally, DOGMA identified significantly more tRNAs than either ARWEN or tRNAscan-SE, sometimes with quite high COVE scores. Two tRNAs in particular were not recovered, tRNA-S1 and tRNA-N. These, however could be folded manually. [...] Nucleotide sequences were aligned based on amino acid alignments using MUSCLE 3.6 []. Models of nucleotide evolution were estimated for protein coding genes using jModeltest []. DnaSP 4.50.3 [] was used to estimate codon usage bias and nucleotide frequency bias [-]. The CODEML program in the PAML4.2 package [] was used to test for site-specific evidence of positive selection while correcting for nucleotide bias []. We employed the following parameters: runmode = 0, omega and kappa estimated (from three different starting points), empirical codon frequencies from each codon position (codonfreq = 2).Following the recommendations of Posada [], we employed a suite of recombination detection programs offered in the program packages TOPALi 2.5 [] and RPD 3b32 [] and the RecombiTest website [] to test for recombination in the Solenopsis mitogenomes (see Table for specific tests used). When any of the recombination tests only utilized three sequences at a time (e.g., RDP), analyses were repeated with every possible sequence triplet combination and p-values were Bonferroni corrected. All settings were left at the software default for the initial analyses, except for the PDM and LRT, where we used flexible window sizes. The highest acceptable p-value was 0.05 (unless Bonferroni corrected). Loosely following Tsaousis et al.'s [] criteria for evidence of recombination, we consider as good evidence for recombination when more than one test detected a recombination event (although without regard to the test being a global or a local method). The more tests recovering evidence for recombination the more confident we are that it represents a true recombination event. Although this classification is admittedly arbitrary, we agree with White et al. [] that identifying instances of recombination is inherently difficult and requires the heuristic use of several methods to identify potential recombinants.Phylogenetic analyses were conducted on protein coding genes of the hymentoperan mitogenomes and 9 outgroups (3 flies [GenBank: X03240, AF260826, AJ242872], 3 beetles [GenBank: AJ312413, DQ768215, AB267275], and 3 moths [GenBank: AF442957, AF149768, AY242996]). jModeltest [] was used to estimate the most appropriate model of nucleotide evolution for each codon position at each locus separately. Following the suggestion of Dowton et al. [] we used the Bayesian approach using nucleotide sequences and implemented the GTR+I+Γ model of sequence evolution across genes and codon positions since jModeltest usually indentified this model as the best fitting for each data partition. MrBayes 3.1.2 [] was then used to recover phylogenetic hypotheses. All parameters were unlinked between partitions. Two independent analyses were run for three million generations, each with three heated and one cold chain. Parameters were sampled every 1000th generation. Convergence between runs was assessed when log-likelihoods had plateaued, PRSF factors were ~1, and split frequencies had dropped < 0.01. Samples taken prior to convergence were removed before samples were summarized. The same analysis was repeated implementing the covariotide model of sequence evolution to account for heterotachy (changes in site-specific evolutionary rates across lineages) [], which has been shown to effectively accommodate heterotachy [[], but see []]. Since this analysis took longer to converge, 5 million generations were run.Maximum likelihood analyses were implemented on the PhyML 3.0 webserver [] We implemented the GTR+I+Γ model of nucleotide substitution on the unpartitioned dataset, estimated proportion of invariable sites and gamma shape parameter using six substitution rate categories, and optimized equilibrium frequencies, branch lengths, and tree topology (using the nearest-neighbor interchange [NNI] and sub-tree pruning and regrafting [SPR]) on five random starting trees. In addition to running one hundred bootstrap replicates to estimate levels of branch support, we also implemented the SH-like aLRT, which assesses the likelihood gain of the presence of that branch []. To accommodate non-stationarity (changes of base frequencies between branches) we implemented nhPhyML-Discrete [] using default options and the topology recovered from the heterotachous Bayesian analysis as the starting tree.The evolution of tRNA-N was studied using phylogenetic analyses as suggested by Saks et al. [] and Dowton and Austin [], which were conducted using 100 bootstrap replicates in PhyML using the same configuration as described above. Other relevant hymenopteran tRNAs (D, N, and V) were downloaded from GenBank and aligned using MUSCLE. Unlike other authors [,,], unpaired loops and anticodons were not removed following the suggestions of Wong et al. []. However, we would like to point out that the phylogenetic analysis should only be interpreted as a heuristic tool, since the alignment of many very short, evolutionary very old, and highly AT-biased sequences is not trivial, regardless of alignment method used or prior editing to remove problematic areas. […]

Pipeline specifications