Computational protocol: Pan-African phylogeny of Mus (subgenus Nannomys) reveals one of the most successful mammal radiations in Africa

Similar protocols

Protocol publication

[…] Sequences of CYTB and IRBP were edited and aligned in SeqScape v2.5 (Applied Biosystems), producing a final alignment of 1140 and 1276 bp, respectively. The Findmodel web application (http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html) was used to identify the most appropriate substitution model for each gene. The Akaike information criterion (AIC), compared among 12 substitution models, revealed that the model best fitting the ingroup data was the General time reversible model with a gamma-distributed rate variation across sites (GTR + G) for both CYTB and IRBP. As outgroups, we used sequences of four species from other subgenera of the genus Mus, i.e. M. platythrix (CYTB GenBank Acc. code AJ698880, IRBP GenBank Acc. code AJ698895), M. pahari (AY057814, AJ698893), M. caroli (AB033698, AJ698885) and M. musculus (V00711, AF126968); two sister lineages of the genus Mus within subfamily Murinae, i.e. Apodemus flavicollis (AB032853, AB032860) and Ratus norvegicus (V01556, AJ429134); and one species from the subfamily Acomyinae, Acomys cahirinus (AJ233953, AJ698898) see also [,,].Phylogenetic relationships within Nannomys were inferred by maximum likelihood (ML) and Bayesian (BI) approaches. ML analysis was performed using RAxML 8.0 []. The GTR + G model (option -m GTRGAMMA) was selected for the six partitions, i.e. 1140 bp of CYTB, 1276 bp of IRBP, and both genes were partitioned also by the position of nucleotides in the codons (option -q). The robustness of the nodes was evaluated by the default bootstrap procedure with 1,000 replications (option -# 1000). Bayesian analysis of evolutionary relationships was performed by Markov chain Monte Carlo (MCMC) method in MrBayes v. 3.2.1 []. Three heated and one cold chain were employed in all analyses, and runs were initiated from random trees. Two independent runs were conducted with 5 million generations per run; and trees and parameters were sampled every 1,000 generations. Convergence was checked using TRACER v1.5 []. For each run, the first 10% of sampled trees were discarded as burn-in. Bayesian posterior probabilities were used to assess branch support of the Bayesian tree.The most widespread Nannomys species (= MOTU, see below) is M. minutoides. For this species we performed more detailed analysis of intraspecific genetic variability. We selected 131 sequences belonging to this clade and trimmed the final alignment to the length of 741 bp. Haplotypes were generated using DNaSP software [] and a median-joining network of haplotypes was produced in the software Network 4.6.1.2 (downloaded on 10.2.2014 from http://www.fluxus-engineering.com/sharenet.htm). [...] We estimated the possible number of putative species (called here molecular operational taxonomic units, MOTUs, until the thorough taxonomic evidence will be provided) of Nannomys in the sampled dataset by using two types of divergence thresholds and the CYTB dataset. The first was the time threshold estimated by the Generalized Mixed Yule Coalescent (GMYC) model [] which describes single-locus branching pattern as a succession of speciation events replaced at a fixed threshold time by a succession of intraspecific coalescent events. The two stages are modelled by Yule process and neutral coalescent, respectively, which allows finding maximum likelihood estimate of the threshold time and evaluating statistical support for the delimited species [,]. In this framework reliably delimited species are those whose basal internal split occurred well after the speciation-coalescence threshold and which diverged from sister species well before it. We therefore calculated two kinds of support: (1) for each intra-specific basal split we calculated relative likelihood that it represents coalescence rather than speciation event by summing up Akaike weights of all threshold times older or equal to its age; (2) for each inter-specific split we calculated relative likelihood that it represents speciation as a sum of Akaike weights of threshold times younger to it. The ultrametric tree required by GMYC was produced by BEAST 1.8.0 [] with uncorrelated lognormal distribution of substitution rates and lognormal priors for node ages mimicking posteriors from the divergence dating (see below). We used the Yule prior assuming no intra-specific divergences (alternative analyses with a coalescent prior assuming no speciation events lead to almost identical results of GMYC analyses; not shown). The topology was constrained to match the branching order of main lineages observed in the maximum likelihood phylogeny. The GMYC analysis was performed using the R package ‘splits’ (http://r-forge.r-project.org/R/?group_id=333).The second threshold was based on sequence divergence, taken as a proxy for the amount of genetic difference among distinct gene pools. We therefore analyzed the distribution of Kimura-2 parameter (K2P) corrected genetic distances on CYTB among GMYC-delimited species (calculated in Mega 5.05; []) and merged the lineages with less than 7.3% genetic distance, i.e. the mean value between sister species of rodents []. The resulting groups were designated as molecular operational taxonomic units (MOTUs) and provisional names were assigned to them. It is important to note that the aim of our MOTUs delimitation approach is not to change the current taxonomy, but to highlight the taxa and geographical areas worthy of further taxonomic study, including morphological, ecological and more detailed genetic approaches. […]

Pipeline specifications

Software tools SeqScape, RAxML, MrBayes, DnaSP, BEAST, MEGA
Applications Phylogenetics, Sanger sequencing
Organisms Mus musculus, Homo sapiens