Computational protocol: Genetic Divergence among Regions Containing the Vulnerable Great Desert Skink (Liopholis kintorei) in the Australian Arid Zone

Similar protocols

Protocol publication

[…] ND4 sequences were checked by eye and aligned with ClustalW, implemented in MEGA 5.0 [], and submitted to GenBank (Accession numbers KM035773-KM035789). DNA sequences were then translated into amino acid sequences using the vertebrate mitochondrial code. No premature stop codons were observed, indicating that all sequences are true mitochondrial copies. Haplotype and nucleotide diversities were calculated in DnaSP [].A minimum-spanning network of ND4 haplotypes was constructed in TCS 1.21 []. Global and pairwise Φ ST, an analogue of F ST [], were calculated from ND4 haplotypic data in Arlequin v3.5 [] with 1000 permutations.Microsatellite alleles were visualised and scored using Peak Scanner 1.0 (Applied Biosystems). To ensure amplification and scoring consistency, at least 10% of samples at each locus were independently rerun and genotyped. Summary statistics, including exact tests for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were conducted in GenAlEx 6.4 [] and GENEPOP 4.2 []. Effective population size (Ne) estimates were calculated utilizing the approximate Bayesian framework implemented in ONeSAMP v1.2 []. Due to prohibitively small sample sizes at Docker River and Warburton, these sampling localities were excluded from population-level analyses.When calculating F ST analogues from highly polymorphic data such as microsatellites, within-population variance can often approach the level of the total variance, resulting in very low F ST values even when the populations share no alleles [,]. Following Hedrick [] and Meirmans [], pairwise fixation index values calculated from microsatellite data (hereafter F′ ST) were standardised using the program RECODEDATA 0.1 [].STRUCTURE v.2.3 [] analysis was used to assess genotypic clustering and assignment probabilities. We examined values of K = 1–8 (double the number of sample sites included in the analysis), with 10 replicate runs for each, 105 MCMC iterations burn-in and 104 main iterations. Hubisz et al. [] developed a new model for STRUCTURE, which allows the use of sample-site information. This is different to the initial models including location priors, in that it adds power to analyses, but can disregard site information when true clustering is uncorrelated with sampling locations. We used the ‘admixture’ model with correlated allele frequencies, and repetitions were run with and without location information. The number of genetic clusters (K) was determined using the ΔK method of Evanno et al. [].Discriminant analysis of principal components (DAPC) was used to describe the genetic relationship between sampling localities. DAPC is a multivariate analysis that first uses principal components analysis (PCA) to transform data into uncorrelated components. These components are then analysed using a linear discriminant method, minimising within-group variance while maximising among-group variance []. Furthermore, this analysis does not assume HWE and LD, which are often violated when working with natural, small and fragmented populations [].DAPC was carried out in the R package adegenet [], implemented in R 2.12 (R development core team 2013;, with K selected using the find.clusters function and Bayesian Information Criterion (BIC). We also ran DAPC using sample locations as groups (K = 4) to assess the differentiation of our sample sites. PCA was performed in R using the dudi.pca function in the package ade4 []. Missing data were replaced with the mean (the origin of the X- and Y-axes, as in Horne et al. []). Determining the number of principal components (PCs) to retain as predictors for the discriminant analysis requires a balance between the statistical power of more PCs, and the stability of assignments, though there is no strict rule. Retaining too many PCs with respect to sample size can result in over-fitting the data. This trade-off can be assessed using the a.score function in the R package adegenet []. Analyses were carried out retaining a conservative 13 PCs, the optimal number suggested by a.score, given our relatively small dataset. […]

Pipeline specifications