Computational protocol: Diversification of the Genus Anopheles and a Neotropical Clade from the Late Cretaceous

Similar protocols

Protocol publication

[…] Each gene was aligned individually using the program MAFFT 7 []. Alignments were then inspected and edited in MEGA version 5.1 []. Individual alignments were then concatenated in the SeaView 4 program [] assuming species-level monophyly. The final alignment matrix included 157 bp of the 5.8S marker, 525 bp of the first segment of COI and 562 bp of the second segment, and 684 bp of COII, summing to a total length of 1,928 bp. The final alignment is available online at the Dryad database and at www.edarwin.net/data/Anopheles. Two methods of phylogenetic reconstruction were implemented using the GTR+G+I substitution model as indicated by the jModelTest2 program []. The first was a Bayesian inference (BI) method conducted in the program MrBayes 3.2 []. The Markov Chain Monte Carlo (MCMC) algorithm was executed in two independent runs. Each run was sampled every 1,000th generation until 10,000 trees were obtained, with 25% excluded as burn-in. In this tree, the clade Bayesian posterior probability (BP) was used as a metric of topological support. Convergence of the chains was assessed via the potential scale reduction factor, which was close to 1.0 for all parameters, and the effective sample size (ESS), which was > 200 for all parameters. The second method was maximum likelihood (ML). In this case, the algorithm was implemented in the PhyML program package 3 [], and the topological support test was the approximate likelihood ratio statistic, aLRT [].We have also investigated whether data partitioning would impact topological inference. Partitioning scheme was inferred with the PartitionFinder software [] by searching through all substitution models and using the Bayesian information criterion (BIC) to choose between alternative models. Three data blocks were tested, namely, 5.8S, COI and COII, and the greedy search algorithm was used. The best partitioning scheme was composed of two partitions, a mitochondrial partition containing COI and COII, under the GTR+G+I model, and a single partition for 5.8S, under the K80+G model. Phylogenetic inference using the estimated partitioning scheme was conducted in MrBayes, using the same MCMC settings as above, and also in RAxML 8 [], which implements a fast maximum likelihood topological search. [...] The molecular dating analysis was conducted in a Bayesian framework with the program BEAST 1.7.8 [] that also uses a MCMC algorithm to infer the posterior distribution of the parameters. As in the phylogenetic inference, the elected model for nucleotide substitution was GTR+G+I. The prior distribution of the evolutionary rates along branches was modeled by the uncorrelated lognormal distribution, whereas the Yule process was adopted to model the tree prior. The MCMC run consisted of 100,000,000 generations with parameters sampled every 1,000th step. A burn-in period of 25,000 generations was discarded. The BEAST analysis was repeated twice to check for convergence, which was assessed by the potential scale reduction factor as implemented in the coda package of the R programming environment (www.r-project.org). ESSs were also calculated in Tracer 1.6, resulting in values > 200 for all parameters.To decompose the branch lengths (i.e., genetic distances) into absolute times and evolutionary rates, calibration priors on node ages are required. Usually, these priors are obtained from the fossil record or from the mean evolutionary rate. As with most non-vertebrate taxa, however, the Anopheles fossil record is very scarce because only two Anopheles fossils are currently recognized. The oldest record is Anopheles (Nyssorhynchus) dominicanus from the Late Eocene (33.9–41.3 Ma) [], and the most recent is Anopheles rottensis from the Late Oligocene (13.8–33.9 Ma) []. Nevertheless, the usage of these records as time priors has been deemed notably problematic. Although the A. domincanus fossil was assigned to subgenus Nyssorhynchus, the age of the fossil varies from 15 Ma to 45 Ma, depending on the dating technique applied []. Thus, as in many studies with mosquitoes, we have relied on the split between Aedes and Anopheles that has been estimated at 145 Ma (97.7–193.7) by Logue et al. [], in which the timescale was calibrated using the estimate of the Drosophila-Anopheles divergence at 260 Ma obtained by Gaunt and Miles []. Thus, a Gaussian calibration prior with mean = 145 Ma and standard deviation = 25 Ma was adopted to accommodate the 97.7–193.7 range within the 95% highest probability density interval.For the ancestral area reconstruction analysis, we first associated each terminal taxon to one of the following area categories: (1) Americas; (2) Africa; (3) Europe; (4) India plus West Asia; and (5) Southeast Asia plus the Pacific. Geographical areas were categorized according to Sinka et al. (2012), in which comprehensive distribution data for dominant malaria vectors was gathered and an Anopheles global map was created. Ancestral reconstruction was implemented using the ML method [] available in the APE package [] of the R programming environment. We also implemented the ancestral geographic range estimation method using the Lagrange software []. […]

Pipeline specifications