Computational protocol: Genetic Evidence for Multiple Sources of the Non-Native Fish Cichlasoma urophthalmus (Günther; Mayan Cichlids) in Southern Florida

Similar protocols

Protocol publication

[…] Sequences were aligned using Sequencer v.4.8 and checked manually. Cytochrome b haplotypes were analyzed using MRMODELTEST 2.3 and MRBAYES 3.2. . We conducted hierarchical hypothesis tests to select the appropriate evolutionary model for subsequent Bayesian phylogenetic analysis. The program MRMODELTEST calculated base frequencies, which were used to model the prior probability distribution; likelihood ratio tests selected the TrN model (equal transversion rates but two different transition rates) for the Bayesian analysis. Bayesian phylogenetic analysis was run for 1,000,000 generations, sampling every 100 generations. We discarded the initial 10% of trees during the ‘burn-in period’ and made a 50% majority consensus rule from the remaining Bayesian trees. The analysis was repeated twice to avoid searching within local optima. The phylogenetic tree was used to identify distinct clades where haplotypes were shared among Mayan Cichlids from southern Florida and from the native range. Unlike typical phylogenetic trees that include taxa on their branches, we replaced the taxa with sampling locations to examine the phylogenetic relationships among sites resulting in a general area cladogram .To investigate the relationships between clades, haplotype networks were built using Network v. 4.6.11 and Network Publisher (http://www.fluxus-engineering.com/). The maximal pairwise difference between sequences was 6 and the tranversion:transition ratio was weighted as 2∶1; we therefore specified the weighted genetic distance (epsilon) as 120 and conducted a median-joining analysis using the greedy distance calculation method . [...] The number of different alleles, the number of effective alleles, observed and expected heterozygosities, inbreeding coefficient (FIS) and percentages of polymorphic loci were calculated for Florida, Upper Yucatán Peninsula, South of Yucatán Peninsula, Belize, Guatemala, Honduras, and Nicaragua using GenAlEx v.6.5 , .To detect evidence of a recent bottleneck or reduction in population size of Mayan Cichlids in Florida, we used the software Bottleneck v.1.2.02 . We performed the Wilcoxon signed rank test to test for heterozygosity excess. When a bottleneck occurs, it is expected that both allele frequencies and heterozygosities decrease, however, allele frequency is expected to decrease faster than heterozygosity. Thus, the program Bottleneck tests for heterozygosity excess by comparing expected heterozygosity under Hardy-Weinberg equilibrium to heterozygosity expected under mutation-drift equilibrium determined by the number of alleles . We tested for heterozygosity excess under the Stepwise Mutation Model.Genetic relatedness of populations was assessed using Bayesian clustering in STRUCTURE v.2.3.4 . STRUCTURE was used to estimate the number of populations (K) most likely present in the samples. The parameters were set using an admixture model with independent allele frequencies and sampling locations were used as priors; values for the level of admixture (alpha) were inferred from the dataset. STRUCTURE analyses were performed using the freely available Bioportal server (http://www.bioportal.uio.no) . The burn-in length was set to 50,000 and the simulation to 500,000 repetitions. Each run was iterated 20 times. We evaluated results for K = 1 to K = 35. To determine the most probable clustering of the data, K was selected using the ΔK approach as implemented by Structure Harvester . The variable ΔK is calculated from the rate of change of the log likelihood of the data between runs with successive values of K . CLUMPP v.1.1.2 was used to summarize parameters across 20 iterations and the corresponding graphical output was visualized using DISTRUCT v. 1. 1 .ABC was used to test different introduction pathways of Mayan Cichlids into Florida using the microsatellite data and the program DIYABC . ABC uses summary genetic statistics (such as genetic distance and the number of alleles) to compare observed and simulated datasets given hypothesized scenarios. Posterior distributions of parameters for the proposed models – possible introduction pathways in our case – are calculated from the differences between the observed and simulated datasets , . Hypotheses and scenarios were generated on the basis of the results of phylogenetic analyses of cytochrome b, population assignment by cluster analysis, as well as on historical biogeography and hydrology of the native range (see for proposed scenarios). Cytochrome b phylogeny indicated that samples from Belize, Honduras and Nicaragua were within the same clade and cluster analysis also grouped samples from those regions (see ), although there appeared to be some overlap among individuals from Belize and Florida. Cytochrome b data also showed that samples from both the eastern and western coasts of Florida were within the same clade and also part of the same cluster (see ).We tested two groups of scenarios using the software DIYABC v. 2.0 wherein the scenarios increased in complexity by changing the grouping of samples into populations to improve model fit (). The results from the first group of scenarios informed the second group. The first group contained 15 scenarios that used five distinct populations from Florida, Mexico, Guatemala, a possible unsampled source population, and a grouping of Belize, Honduras and Nicaraguan sites (hereafter referred to as BHN); Belize, Honduras and Nicaragua were grouped together because they shared the same cytochrome b haplotype and were assigned to the same population by Bayesian cluster analysis (). Samples from East and West Florida were combined into one population because both phylogenetic analysis and cluster analysis grouped them together. In the first grouping of scenarios, we tested whether Mayan Cichlids were introduced into Florida from BHN, Mexico, Guatemala, from both Mexico and Guatemala, or from an unsampled population in Central America. We also included a possible unsampled, ‘ghost’ population of Mayan Cichlids in Central America which, in some scenarios, was the source for populations in Mexico and Guatemala. The second group contained nine scenarios that merged cytochrome b results and hydrology of the region; we separated the Mexican samples into two populations, Upper Yucatán Peninsula (YP) and south of the Yucatán Peninsula, and categorized Belizean samples as a distinct group because the Belizean sites are within the Usumacinta Province unlike the Honduras and Nicaraguan sites, which were grouped together (). The cenote-rich Upper Yucatán Peninsula lacks any major rivers or drainages that connect it to the regions south of the Peninsula , , so we treated those areas as separate populations for the second group of scenarios. The second group of nine scenarios used the population from south of the Yucatán Peninsula as the most recent common ancestor (MRCA) and tested whether Mayan Cichlids in Florida were introduced from Mexico, Guatemala, or Belize, or whether there were multiple introductions from those regions.For both sets of scenario analyses in DIYABC, we used broadly defined priors as no prior values were known for the parameters (). We used the Generalized Stepwise Mutation Model with a uniform prior distribution for the mean mutation rate (1E4 – 1E3). The ‘one sample summary statistics’ used for each population were the mean number of alleles, the mean genetic diversity, mean size variance and, mean Garza-Williamson's M. The ‘two sample summary statistics’ used were compared between population pairs, and included Fst, mean index of classification (the mean individual assignment likelihood of individuals collected in one population and assigned to another population), and (δμ)2 genetic distance . For each scenario, 1,000,000 simulated datasets were created. Prior-scenario combinations were evaluated using Principal Components Analysis (PCA) as implemented by the software. Posterior probabilities of scenarios were compared with logistic regression using 1% of the closest simulated datasets, as implemented by DIYABC v. 2.0. Estimations of parameters were also computed and performance of parameter estimates was evaluated by assessing confidence and bias as implemented by the software. […]

Pipeline specifications