Computational protocol: Little Divergence Among Mitochondrial Lineages of Prochilodus (Teleostei, Characiformes)

Similar protocols

Protocol publication

[…] Specimens were collected under a permanent permission number 13843-1 from MMA/IBAMA/SISBIO and subsequently preserved in 95% ethanol. We included 146 specimens spanning all 13 species of Prochilodus collected across all South America plus Semaprochilodus taeniurus to root the trees (total 147 taxa). We sequenced barcodes for 19 specimens and supplemented the matrix with 127 additional barcodes of Prochilodus available at the public genetic databases Genbank ( and Barcode of Life Database (BOLD; Supplementary Table contains voucher and locality information and accession numbers for databases.Genomic DNA was extracted from muscle tissues preserved in 95% ethanol with a DNeasy Tissue kit (Qiagen Inc.; according to the manufacturer's instructions. We obtained partial sequences of the mitochondrial gene cytochrome oxidase c subunit I by amplifying via polymerase chain reaction (PCR) using the primer described in the literature (Melo et al., ) and modifying reaction steps as follow: 12.5 μl as a total volume with 9.075 μl of double-distilled water, 1.25 μl 5x buffer, 0.375 μl MgCl2 (50 mM), 0.25 μl dNTP mix, 0.25 μl of each primer at 10 μM, 0.05 μl Platinum Taq DNA polymerase enzyme (5 units/μl, Invitrogen; and 1.0 μl genomic DNA (10–50 ng). The PCR consisted of an initial denaturation (4 min at 95°C) followed by 28–30 cycles of chain denaturation (30 s at 95°C), primer hybridization (30–60 s at 52–54°C), and nucleotide extension (30–60 s at 72°C). After the visualization of the fragments using 1% agarose gel, we performed the sequencing reaction using dye terminators (BigDye™ Terminator v 3.1 Cycle Sequencing Ready Reaction Kit, Applied Biosystems; purified again through ethanol precipitation. We then loaded the samples onto an automatic sequencer ABI 3130-Genetic Analyzer (Applied Biosystems) at the São Paulo State University, Brazil. [...] We assembled and edited the newly generated consensus sequences in Geneious 7.1.9 (Kearse et al., ) and aligned the whole matrix with Muscle (Edgar, ). This matrix contains 147 taxa (146 Prochilodus plus one Semaprochilodus) and 648 bp. To evaluate the occurrence of substitution saturation, the index of substitution saturation in asymmetrical (Iss.cAsym) and symmetrical (Iss.cSym) topologies were estimated in Dambe 5.3.38 (Xia, ). We used PartitionFinder 1.1.0 (Lanfear et al., ) to select the best-fit model of nucleotide evolution for our dataset.Species were previously identified following the most recent and complete taxonomic revision (Castro and Vari, ), and lineages were proposed based on subsequent topologies. Most available sequences are from vouchers already identified by the first author (e.g., Melo et al., ) or from previous studies with endemic species (Carvalho et al., ; Rosso et al., ; Pereira et al., ; Díaz et al., ). We then generated overall and pairwise values of genetic distance based on Kimura-2-parameters (K2P)+Gamma using Mega 7.0 (Tamura et al., ) and a neighbor-joining tree (NJ) with 1,000 replicates of bootstraps using Geneious 7.1.9. We also performed a maximum likelihood (ML) analysis under RAxML HPC-PTHREADS-SSE3 (Stamatakis, ) using five random parsimony trees with the GTRGAMMA model (Stamatakis et al., ) without rooting and with other parameters at default. We used the autoMRE function to generate pseudoreplicates through MRE-based stopping criteria (Pattengale et al., ) that ran a total of 650 replicates. Stopping criteria determine when enough replicates have been generated so that robust bootstraps under ML analysis become computationally practical (Pattengale et al., ).An ultrametric gene tree was generated in a Bayesian inference with Beast 1.8.0 (Drummond et al., ) using two independent runs of 50 millions generations sampling trees every 5000th generation. Convergence was indicated by Tracer v1.5 (Rambaut et al., ) with estimated sample sizes (ESS) superior to 200. An appropriate number of trees (first 10%) from each run was discarded as burn-in and the MCMC samples was generated using the maximum clade credibility (MCC) topology in TreeAnnotator v1.4.7 (Drummond et al., ) and visualized in FigTree v1.4.3.The general mixed Yule coalescent (GMYC) method (Pons et al., ; Fujisawa and Barraclough, ) was performed using the ultrametric gene tree estimated with the exponential growth coalescent model (Griffiths and Tavaré, ) and the lognormal relaxed clock model (Drummond et al., ), which assumes that the rates of molecular evolution are uncorrelated but log-normally distributed among lineages. Species delimitation through GMYC model was conducted using standard parameters [interval = c(0, 10)] and a single threshold that specifies the transition time between to within species branching. Such analysis was conducted with the package splits (Species Limits by Threshold Statistics; in R v.3.0.0 (R Development Core Team, ). GMYC appears to be useful for single-locus analysis (Fujisawa and Barraclough, ) but depends on the availability of additional data/analyses from independent characters (Esselstyn et al., ). Additionally, we used the Bayesian Poisson Tree Processes model (bPTP) (Zhang et al., ) in the bPTP webserver ( under default parameters. bPTP does not require an ultrametric gene tree and uses, instead, a nexus tree as input file with branch lengths representing the number of nucleotide substitutions (Zhang et al., ). We used a nexus MCC tree generated in Beast 1.8.0 (Drummond et al., ) as input file and ran 500,000 generations (thinning = 500). We also used a clustering species delimitation analysis through the Automatic Barcode Gap Discovery (ABGD; Puillandre et al., ) that automatically defines sequences into hypothetical candidate species based on confidence limits for intraspecific divergence. We used a pairwise distance matrix generated in Mega 7.0 (Kumar et al., ) through K2P+G model and 1,000 pseudoreps as input file into the ABGD webserver ( with other parameters left at defaut.Population genetic analyses were conducted in order to detect levels of genetic variance among haplotypes. We excluded four taxa and excized flanking regions with elevated missing data to properly run those analyses. This reduced matrix contained 143 taxa and 465 bp. Each mitochondrial lineage previously determined by distance and likelihood analyses was treated as a distinct population. We used DnaSP v.5.10.01 (Librado and Rozas, ) to obtain the number of polymorphic sites, haplotype number, and nucleotide/haplotype diversity. In Arlequin 3.5.1 (Excoffier and Lischer, ), each mitochondrial lineage was set as a single population with the following hypothetic group structuring (group 1 = outgroup; group 2 = lineage 1; group3 = lineage 2; group 4 = lineages 3, 4, and 5; group 5 = lineages 6, 7, and 8) based on the arrangement from ML and Bayesian trees. We ran an analysis of molecular variance (AMOVA; Excoffier et al., ) with 1,000 permutations using conventional F-statistics and generated the haplotype network using the median joining analysis (Bandelt et al., ) incorporated in PopART 1.7 (Leigh and Bryant, ). […]

Pipeline specifications