Computational protocol: Phylogeography of Camellia taliensis (Theaceae) inferred from chloroplast and nuclear DNA: insights into evolutionary history and conservation

Similar protocols

Protocol publication

[…] Because some of the SNPs used in this study were discovered within close proximity to one another, they could not be treated as independent markers. For each set of linked SNP loci, we employed a Bayesian statistical method implemented in Phase version 2.1.1 [,] to resolve the gametic phase of PAL sequences with multiple heterozygous single nucleotide polymorphisms (SNPs). This program uses allele frequencies and frequencies of known SNP haplotypes in each population to infer the probabilities for each possible haplotype from a group of linked SNPs. A total of five independent runs of 100 iterations each were performed with other parameters as default. The goodness-of-fit values were very similar among different runs, indicating that their run lengths were sufficient in the present study. For each newly ‘phased’ locus, we selected the two haplotypes for each sample that had the highest probability as assessed by PHASE. These haplotypes were then used as multi-allelic genotypes for further analysis. Only those alleles and genotypes resolved with > 95% posterior probabilities were remained for subsequent analyses. Sequences were proofed and aligned by using CLUSTAL _X [] as implemented in BioEdit []. Indels in the cpDNAs were treated as substitutions by following Caicedo and Schaal [].Global and population nucleotide diversity (π) [], haplotype diversity (h), average number of nucleotide differences between the whole sequences (K), and the number of polymorphic sites (S) were calculated using DNASP 4.10 []. Tajima's D[] and Fu & Li's D*[] neutrality tests were applied to determine whether a locus evolves in a neutral manner. The minimum number of recombination events (RM) was assessed using the algorithm of Hudson and Kaplan [] in the DNASP 4.10 program.Nested clade phylogeographic analysis (NCPA) was performed by following the approach [] in the program ANeCA []. Significantly parsimonious connections were then constructed by using the program TCS [], with a 95% parsimony connection limit. On basis of the resulting network, nested clades were further defined following the rules of Templeton et al. [] and Templeton & Sing []. In the study, the program GEODIS [] was used to test whether there is geographic associations of clades as well as nested clades or not under the null hypothesis, with a 95% confidence level and with 10,000 permutations. If significant values were detected, the inference key of Templeton [] was used to explain their likely population processes and/or historical events within these clades.The approximate divergence times between clades defined by nested clade phylogeographic analysis were estimated following Yuan et al. [], using T = dA/2 μ,where T is the divergence time and μ is the rate of nucleotide substitution []. The net pairwise divergence per base pair (dA) was calculated using MEGA4 [] under the Kimura two-parameter model []. In this study, considering that a substitution rate had not yet been estimated for the cpDNA genome of Camellia, 1.0-3.0 × 10-9 substitutions per site per year for synonymous cpDNA sites in seed plants [], were taken as a rough evolutionary rate for rpl32-trnL intergenic spacers to date their divergence times.An analysis of molecular variance (AMOVA) [] was carried out with Arlequin 3.1 [] to determine the partitioning of variation within and between populations. Two measures of population differentiation GST and NST were compared by using U-statistic implemented by the program HAPLONST []. GST values were estimated by haplotype frequencies, while NST was obtained by considering similarities between haplotypes (i.e. the number of mutations between haplotypes). An NST which is significantly larger than a GST, indicates the presence of a phylogeographical structure with closely related haplotypes being detected more frequently in the same area than remotely correlated ones.The Mantel test implemented in the program Arlequin 3.1 [] was applied to examine the correlation between the natural logarithm of the geographical distance and Slatkin's measure MM = (1/FST − 1)/2], a measure of the extent of gene flow under an island model at equilibrium []. Statistical significance was also tested with 10, 000 permutation tests by using the program Arlequin 3.1. […]

Pipeline specifications

Software tools Clustal W, BioEdit, DnaSP, MEGA, Arlequin
Application Population genetic analysis
Organisms Camellia taliensis