Computational protocol: Fine scale patterns of genetic partitioning in the rediscovered African crocodile, Crocodylus suchus (Saint-Hilaire 1807)

Similar protocols

Protocol publication

[…] We screened eleven crocodile specific microsatellite loci developed by that were previously found to be informative in C. suchus (). Of the loci screened, nine (Cj18, Cj119, Cj104, Cj128, Cj35, Cj101, Cj131, Cjl6, and Cud68) properly amplified and were found to be polymorphic. We performed simplex PCR in 16 µL reactions consisting of 10.0 ng DNA template, 0.4 µM fluorescently-labeled forward primer, 0.4 µM reverse primer, and 1X Applied Biosystems Amplitaq Gold 360 Master Mix. PCR conditions were as follows: initial denaturation of 94 °C for 5 min, 35 cycles of 94 °C denature for 4 min, TA °C anneal for 1 min as in , and 72 °C extension for 1:30 min, followed by a final extension at 72 °C for 10 min. We used negative controls in all reactions and visualized PCR products on 1.0% agarose gels to confirm successful amplification. We multipooled PCR products and ran them on an ABI 3100 DNA Analyzer with GeneScan 500 LIZ size standard (Applied Biosystems Inc., Carlsbad, CA, USA). We scored alleles in GeneMarker 2.2.0 (SoftGenetics, State College, PA, USA). We removed individuals in which alleles could not be identified at more than one microsatellite loci prior to all downstream analyses (full genotypes, n = 89).We examined microsatellite data for scoring errors and null alleles using MICRO-CHECKER (). We assessed departure from Hardy–Weinberg Equilibrium (HWE) and occurrence of linkage disequilibrium in GENEPOP 4.2 (). We used the genetics software package GenAlEx 6.5 (; ) to estimate expected heterozygosity (He), observed heterozygosity (Ho), and number of alleles (A), and HP-Rare 1.1 () to calculate allelic richness (AR) and private allelic richness (PAR). [...] We employed three different Bayesian clustering methods that identify clusters of individuals based on different underlying assumptions of inheritance to assess genetic population structure: STRUCTURE 2.0 (), BAPS 6.0 (; ), and TESS 2.3 (; ).STRUCTURE 2.0 attempts to identify natural groupings of individual multilocus genotypes by arranging samples into K clusters in a way that minimizes deviations from Hardy–Weinberg Equilibrium and linkage equilibrium. We implemented a correlated allele frequency model with admixture and no sample locality information. For each analysis we conducted 20 independent replicate runs for each a priori assumed number of clusters (K) where K-values varied from 1 to 16, with 16 representing the number of sampling localities (). Each run consisted of an initial burn-in of 1×106 steps followed by 1×107 post burn-in replicates. We estimated the optimal number of clusters (K) by examining the Ln P(X∣ K) and ΔK in the program STRUCTURE HARVESTER (). The ΔK method finds the breakpoint in the slope of the distribution of deviation information criterion scores to infer K; however, it may be unreliable for K = 1 clusters or where multi-modality in log likelihood scores makes selection of K from ΔK difficult. Therefore, we visually compared bar plots of individual Q-values from the chosen K to bar plots from other K-values and the final most likely number of K clusters was chosen combining the ΔK method and our understanding of C. suchus ecology and the western African landscape. We conducted cluster matching from each independent run replicate for relevant K-values in CLUMPP v1.1.2 () and constructed bar plots in DISTRUCT v1.1 ().BAPS 6.0 uses a stochastic optimization algorithm, rather than Markov Chain Monte Carlo (MCMC), to assess optimal partitions of the data and allows for the inclusion of geographic coordinates for each sample locality as biologically relevant non-uniform priors to help the algorithm identify meaningful genetic clusters (). Spatial mixture clustering of individuals was performed for 20 replicates for a maximum number of k = 16 clusters. We selected the clustering solution with the highest posterior probability as the correct partitioning to then perform the admixture analysis. We utilized the recommended parameter values, including 200 iterations for individuals, 200 reference individuals from each population, and 20 iterations for each reference individual (). We visualized the results and created barplots in DISTRUCT 1.1 ().Like STRUCTURE, TESS 2.3 (; ) utilizes an MCMC approach to define genetic clusters under the assumptions of HWE. This program also allows for spatial clustering and detailed admixture analysis (). We ran 50,000 (10,000 burn-in) MCMC iterations five times from K = 2 to K = 16 in the admixture analysis with spatial locations for all individuals. TESS requires unique coordinates for each individual sampled, so coordinates were randomly created within TESS for populations that lacked specific coordinate data for each individual (). To estimate the number of clusters (K), we used the deviance information criterion (DIC) to evaluate runs for convergence (). We conducted cluster matching from each independent run replicates for relevant K-values in CLUMPP v1.1.2 and constructed bar plots in DISTRUCT v1.1.We used the results of the STRUCTURE analysis to determine clusters to be analyzed in the following two analyses (FST and BAYESASS). We preferred these results over the other two methods because it does not a priori incorporate spatial data, which we felt could introduce a potential source of bias given the unequal distribution in sampling across Central and West Africa. In addition, we excluded individuals sampled at the Accra Zoo (n = 2) and Kinshasa Reptile Park (n = 2) from the following two analyses (FST and BAYESASS) due to unreliable original locality information.We assessed the significance of genetic differentiation (FST) amongst clusters in ARLEQUIN (). We analyzed seven and eight populations, where the Senegambian and Guinean samples were alternately lumped into one or split into two populations. We used 10,100 permutations to test for significance of results. We conducted a Principal Coordinate Analysis (PCoA) based on pairwise FST values in GENALEX (; ) and plotted to visualize the relationships among populations ().We implemented a Bayesian Markov Chain Monte Carlo approach in BAYESASS v1.3 to estimate the direction and rate of contemporary gene flow between populations (). The method does not assume that populations are in genetic equilibrium or HWE. As with the FST analysis, we analyzed both seven and eight populations, where the Senegambian and Guinean samples were alternately lumped into one or split into two populations. Initial runs consisted of 3×106 iterations with samples collected every 2,000 iterations, with a sampling burn-in of 1×106, to adjust delta values for allele frequency, migration rate, and inbreeding to ensure 40–60% of the total changes were accepted (). After acceptable delta values were determined, we performed 5 runs consisting of 2×107 iterations sampled every 2,000 iterations with a burn-in of 1×107 iterations. To ensure results consistency between runs, each run used a different random starting seed number. We present the results from the run with the highest log-likelihood. […]

Pipeline specifications

Software tools GeneMarker, Genepop, GenAlEx, BAPS, Structure Harvester, CLUMPP, DISTRUCT, Arlequin
Applications Phylogenetics, Population genetic analysis
Organisms Crocodylus niloticus, Guizotia abyssinica