Computational protocol: Highlighting Astyanax Species Diversity through DNA Barcoding

[…] Total DNA was extracted from muscle fragments following the protocol of the Canadian Center for DNA-Barcoding (CCDB), available at A segment of the 5' region of the mitochondrial COI gene was amplified using different combinations of primers, including L5698-Asn [], FishF1, FishF2, FishR1 and FishR2 [], C_FishF1t1–C_FishR1t1 cocktail [], and H7271-COI []. Polymerase chain reactions (PCR) were run in a 12.5 μl volume containing: 1 μl DNA (concentration 50 ng/μl), 0.25 μl each of the forward and reverse primers (concentration 10 mM), 1.25 μl of reaction buffer, 0.2 μl of 200 mM dNTPs mix, 0.37 μl of MgCl2 and 0.0625 μl (5 units/μl) of Platinum Taq DNA polymerase (Invitrogen).The samples were amplified in a Veriti® 96-well thermocycler (Applied Biosystems), with initial denaturation of 5 minutes at 96°C followed by 35 cycles at 96°C for 45 seconds, 54°C for 45 seconds, 72°C for 1 minute, and final extension at 72°C for 1 minute. The amplified PCR products were cleaned up with ExoSAP-IT (USB Corporation) and sequenced in both directions using the BigDye Terminator v3.1 Cycle Sequencing kit (Life Technologies) in an ABI3130 Genetic Analyzer automated sequencer (Applied Biosystems). [...] The sequences were edited in BioEdit [] and aligned in MUSCLE (Multiple Sequence Comparison by Log-Expectation) []. The first analysis were conduct based on genetic distances calculated in MEGA 5 [], using the Kimura-2-parameter (K2P) substitution model [] to estimate the Neighbor-Joining (NJ) tree [], based on a 2% cutoff value. This cutoff was used to define the initial clusters and then we tested all the NJ clusters defined by the 2% cutoff criterion using alternative clustering methods as proposed by other studies (see section Identification of Operational Taxonomic Units—OTUs). The first analyses was run in the ABGD program via a command line based using the K2P model. To maximize the potential species discovery the parameters were modified (relative value gap X = 0.1, Pmin = 0.005 and Pmax = 0.1) []. The BIN approach focused only on the BAST dataset in the BOLD database and the sequences from Argentina, and necessarily excluded the records from GenBank. Finally, for the GMYC analysis, ultrametric trees were generated in Beast v1.8.0 [] using the Yule speciation and the GTR+G+I nucleotide substitution models (selected by MEGA 5 under BIC criteria), starting from a random tree, with 50 million generations, with the results being recorded every 5000 generations. The convergence of the values was checked in TRACER v1.6 []. The GMYC analysis [] was implemented in the 'splits' (SPecies' LImits by Threshold Statistics) [] package in R, with the "single threshold" option. Only unique haplotypes were used for this analysis, given problems arising from the analysis of redundant data, as previously reported []. For this, the repeated identical sequences were removed using the ElimDupes tool (available at […]

Software tools MUSCLE, FISHR, BioEdit, MEGA, ABGD, BEAST