Computational protocol: Unveiling an ancient biological invasion: molecular analysis of an old European alien, the crested porcupine (Hystrix cristata)

Similar protocols

Protocol publication

[…] The sequences were edited and aligned using BIOEDIT[]. Haplotype diversity Hd, segregating sites S, nucleotide diversity averaged per site π [] and, Synonymous/Non-Synonymous ratio (as regards cytochrome b gene) were calculated using DNASP[]. The same software was used to describe number and frequency of different haplotypes and to calculate θ per sequence from segregating sites [], being this diversity value an informative synopsis of the metrics used. Net genetic distances based on the proposed GTR + Γ + I [] model of sequence evolution were calculated on the whole analyzed mtDNA fragment, exporting tip-to-tip distances obtained from Maximum-Likelihood phylogeny calculated in TREEFINDER[]. A Maximum-Likelihood (ML) and a Bayesian (Ba) methods were used to assess the phylogenetic relationships among mitochondrial haplotypes. TREEFINDER was used to evaluate the best-fit model of sequence evolution for each analyzed data set, using the Akaike Information Criterion. As regards the whole dataset, the proposed model was a GTR + Γ + I with unequal base frequencies. TREEFINDER was used to build the ML phylogeny, while BEAST[] was used to reconstruct the Ba phylogeny. Robustness of the ML trees was tested with 1,000 bootstrap replicates. The relationship among the haplotypes belonging to each tree sub-clade was reconstructed by means of a Median-Joining network analysis, as implemented in NETWORK[].According to Muirhead et al. [], global Fst among native populations was used to assess the confidence index to classify introduced individuals to a source geographic area. It could prevent incorrect identification of invasion source when a low number of individuals has been sampled in putative source populations. Three putative African sources were considered: Tunisia-Libya, Morocco, and East Africa comprising Eritrean, Ethiopian and Tanzanian samples. A Kimura 2-parameters distance method and 10,000 permutations was used to calculate global Fst with the software ARLEQUIN[].TMRCAs of all Italian haplotypes and of 2 sub-clades were estimate on the base of Control Region data using BEAST. Constant_size (Cons_s) and Expansion_growth (Expa_g) tree priors were compared as well as the Relaxed and the Strict molecular clock models. At least three independent runs 3 × 106 steps long were performed for each dataset. Convergence of chains, effective sample size, estimates and credible intervals for each parameter were analyzed with the program TRACER[]. TMRCAs were dated using a 0.4 subs/site/Myr substitution rate. This rate has been proposed by Rajabi-Maham et al. [] for intraspecific polymorphism analyses in Mus musculus domesticus. Since similar mitochondrial substitution rates were recorded in Hystricidae and Muridae [], this molecular rate was also considered in our study.The mtDNA sequences were analyzed in order to find traces of demographic trend from introduction, growth rate and present population size. Several approaches were attempted. These analyses were performed only on the North Central Italy samples, which can be considered, to a large extent, as a panmittic expanding population. Only the fast-evolving mtDNA CR-I was used in these investigations. Two different neutrality tests, which have been shown to be the most powerful [], were used to investigate for population expansion or bottleneck on the sampled CR-I sequences: Fu's Fs [] statistics were calculated, as implemented in ARLEQUIN and R2 [] as implemented in DNASP. Significance of Fs was assessed by 10,000 randomization, while significance of R2 was determined by means of 10,000 coalescent simulation on the basis of observed number of segregating sites in the sample. The mismatch distribution of pairwise differences between sequences was examined to find genetic traces of the demographic trend. The sum-of-squared-differences (SSD) statistic was used to test the goodness-of-fit between the observed mismatch distribution and that expected under a sudden expansion model. The significance of SSD was assessed by 10,000 parametric bootstrap re-sampling [], using ARLEQUIN. Past population demography was also investigated with a Bayesian Skyline Plot (BSP) [], as implemented in BEAST. This coalescent-based approach calculates the effective breeding population size (Nef) through time directly from sampled sequence data and not from previously recognized phylogeny. This procedure can therefore account for uncertainty associated with reconstructed phylogeny. The analysis was done using a different time segmentation ranging from 5 to 10 groups (past Nef points) and substitution model (HKY + I) []. Three independent runs of 50 × 106 iterations for each grouping scenarios were performed. In addition, separate runs were performed, using different coalescent tree priors which model population size through time: Cons_s, Expa_g and Exponential_growth (Expo_g). Three independent runs of 30 × 106 iterations for each scenario were done. Convergence of chains, effective sample size, estimates and credible intervals for each parameters and demographic reconstructions were analyzed with the software TRACER. The three different demographic models were then compared to each other in order to assess their relative fit to the data. Comparisons were made under TRACER workspace evaluating the marginal likelihood ratio (marginal with respect to the tree prior) of pairs of models (Bayes Factor calculation). The best approximation to the marginal likelihood comparison was found calculating the Bayes Factor on the tree Likelihood trace [,]. […]

Pipeline specifications