Computational protocol: Data on phylogenetic analyses of gazelles (genus Gazella) based on mitochondrial and nuclear intron markers

Similar protocols

Protocol publication

[…] We designed new nuclear primers for the amplification of introns of the nuclear encoded genes zinc finger protein 618 (ZNF618), epidermal growth factor receptor substrate 15-like 1 (EPS15L1), SPARC-related modular calcium-binding protein 1 (SMOC1), pantothenate kinase 4 (PANK4), NACHT, LRR and PYD domains-containing protein 2 (NLRP2) and chromodomain-helicase-DNA-binding protein 2 (CHD2; ). We used the sheep (Ovis aries) genome, available on the website of the international sheep genomics consortium (http://www.livestockgenomics.csiro.au/sheep/oar1.0.php), and cattle (Bos taurus) genome, available from the Ensembl genome database (http://www.ensembl.org/Bos_taurus/Info/Index). We searched the sheep genome for annotated protein-coding genes and used the provided Swiss-Prot number to search for the corresponding gene sequences in the cattle genome. If those sequences contained introns of a length between 400 and 1000 bp, we assembled the exons of the respective gene with the complete gene sequence of sheep using Geneious Pro 5.4.2 (Biomatters Ltd., available from http://www.geneious.com). Primers were subsequently designed according to conserved regions of the exons of cattle and sheep in a way that the resulting sequences stretched across at least one intron. To avoid linkage disequilibrium we only used genes on different chromosomes. Primers were designed using the Oligonucleotide Properties Calculator and the reverse complement converter (http://www.bioinformatics.org/sms/rev_comp.html). All primers were synthesized by Eurofins MWG Synthesis GmbH. [...] DNA was extracted using the Qiagen DNeasy blood and tissue kit according to the manufacturer’s protocol. Sequences were obtained by Sanger sequencing, and newly established sequences were deposited in GenBank (). We aligned sequences with MUSCLE (; gapopen=−400; gapextend=−200). In total, the concatenated alignment consisted of 4,623 nucleotides. The Cyt b gene partition was translated into amino acid sequences and checked for stop codons that would indicate potential pseudogenes. The alignments for the six nuclear introns of the genes ZNF618, EPS15L1, SMOC1, PANK4, NLRP2, CHD2 and the mitochondrial Cytochrome b gene are s. [...] Phylogeny and divergence times were estimated with a Bayesian approach in BEAST MC3 1.7.5 . Additionally, we inferred a species tree using a coalescence approach on the multiple loci as implemented in the *BEAST algorithm that we used for subsequent ancestral character (1000 trees) and range (maximum clade credibility tree) estimation. Molecular clock rates and substitution schemes were unlinked between partitions. We inferred the most likely substitution model for each marker using jModelTest 2.1.3 , considering models with equal/unequal base frequencies and with/without rate variation among sites (base tree for likelihood calculations=ML tree; tree topology search operation=NNI; the best model was inferred based on the Akaike Information Criterion). This resulted in a HKY+G model of sequence evolution for all genes except for PANK4 with a HKY model. We applied a Yule tree prior to account for independently evolving lineages. We chose an uncorrelated log-normal relaxed molecular clock using an external substitution rate for the Cytb gene (normally distributed rate with a mean of 1.50±0.15% per Ma; 5–95% interquantile range: 1.25–1.75% per Ma; ). This rate was estimated based on four different alignments of primate protein-coding mitochondrial sequences and fossil calibration points for six primate data sets using a Bayesian approach . For the more conserved nuclear genes reliable external rates were not available, and so we assumed a very broad exponentially distributed prior with a mean of 0.01% per Ma (5–95% interquantile range: 0.01–0.30% per Ma).We ran three chains for 50 M iterations, sampling every 10,000th iteration. Convergence of sampled parameters and potential autocorrelations (effective sampling size for all parameters>200) were investigated in Tracer 1.6 . We discarded the first 10% of sampled trees as burn-in. The maximum clade credibility tree was chosen and parameter values annotated using TreeAnnotator (part of the BEAST package). The resulting substitution rates were 0.97% per Ma for Cyt b (95% credibility interval, CI: 0.05–1.45%), 0.12% per Ma for EPS15L1 (CI: 0.05–0.19%), 0.17% per Ma for NLRP2 (CI: 0.08–0.27%), 0.16% per Ma for SMOC1 (CI: 0.04–0.32%), 0.21% per Ma for ZNF618 (CI: 0.1–0.32%) and 0.11% per Ma for PANK4 (CI: 0.05–0.18%).To confirm the tree topology calculated in BEAST we also analyzed the concatenated data set with a Maximum Likelihood (ML) approach. ML-analysis was performed with RAxML 8.0.14 under a GTR+Γ model that was unlinked for all partitions. Support of nodes was assessed with 1,000 bootstrap replicates. Phylogenetic (Bayesian and ML) and species trees are . [...] We estimated ancestral characters for ecological and behavioral traits using a Bayesian approach to character evolution in BayesTraits multistate 1.0 . The analysis was conducted with 1000 randomly selected post-burn-in trees to account for uncertainty in phylogenetic reconstruction; outgroups were removed with exception of Antilope cervicapra (the sister group to Gazella, see []). We estimated ancestral character states for three key ecological/behavioral traits: habitat type (mountainous vs. plain-dwelling), group size (small groups<15 individuals vs. large herds), and movement patterns (sedentary vs. migratory; see input files). In addition, we reconstructed ancestral character states for presence or absence of horns in females, and the occurrence of twinning (see in ). We ran the analysis for 20 M iterations, sampling every 10,000th iteration and discarding the first 10% as burn-in. To specify the range of values used to seed the prior distribution, we applied an exponential hyperprior with a mean ranging from 0.0 to 0.5 and a rate deviation of seven (twinning=2, female horns=6), resulting in mean acceptance rates between 20% and 40%. To further corroborate the ancestral state in the most recent common ancestor (MRCA) of the genus Gazella we additionally applied a model testing approach. In separate runs – with the general MCMC setting as described above – we constrained the ancestral condition of the MRCA of Gazella to each of the alternative states and compared the harmonic mean of likelihoods (as an estimator of marginal likelihoods) using the Bayes factor (BF). As harmonic means tend to be unstable, we repeated each run five times and calculated the BF from the arithmetic means. Result files of the ancestral character state estimation (ACSE) are s. […]

Pipeline specifications

Software tools MUSCLE, BEAST, jModelTest, RAxML, BayesTraits
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Gazella gazella, Gazella bennettii, Gazella cuvieri, Gazella dorcas, Gazella leptoceros, Gazella spekei, Gazella subgutturosa, Antilope cervicapra, Eudorcas rufifrons, Bos taurus, Ovis aries