*library_books*

## Similar protocols

## Protocol publication

[…] To determine contemporary genetic structuring and individual assignments based on the autosomal microsatellite data set, we used discriminate analysis of principal components (DAPC) implemented in the **adegenet** package of R 2.15.3 and following Silva et al.. DAPC was chosen over Bayesian clustering methods because this method is model free does not assuming Hardy–Weinberg equilibrium or linkage disequilibrium being more appropriate for situations where such assumptions are not met, as is often the case with anchovies.The program POWSIM 4.0 was used to evaluate statistical power for detecting pairwise genetic differentiation at F
ST levels ranging from 0.00 to 0.10. We simulated the divergence of three subpopulations, corresponding to (1) European anchovy (E. encrasicolus, E. capensis and E. eurystole), (2) E. japonicus and (3) E. australis, from a single ancestral population through genetic drift to a given overall F
ST value defined by controlling effective population size (Ne) and number of generations (t). To best reflect the assumingly large Ne of anchovy, we let Ne = 10 000 and varied t from 0 to 2078 for simulating different levels of differentiation. After the simulation, each subpopulation was sampled at n = 381 and divergence from genetic homogeneity was tested with χ-exact test. This procedure was repeated 100 times and the proportion of significant outcomes was used to estimate statistical power for detecting pairwise genetic differentiation.Cyt b sequences were aligned using ClustalX 2.0.3 with default settings, implemented in **Geneious** 5.4, checked and trimmed manually. Sequences were reduced to haplotypes using Collapse 1.2. Number of individuals (N), number of haplotypes (n
h) and haplotype (h) and nucleotide diversities (π) were calculated in **Arlequin** 3.5.1.2 using the cyt b data set.Summary statistics, number of individuals (N), average number of alleles (Aavg), observed heterozygosity (H
O) and expected heterozygosity (H
E) were calculated for each location and for each locus with **Genodive**. Net evolutionary divergence between putative species of OWA was calculated on **MEGA** 5 using the Maximum Composite Likelihood model. The rate variation among sites was modelled with a gamma distribution (shape parameter = 1.48).To examine the relationship between mitochondrial haplotypes, a minimum spanning network was constructed with Arlequin 3.5.1.2 and visualized with **Hapstar**
. Pairwise genetic differentiation was estimated with G
st_est
and Jost’s D
est value, both within and between putative species, following Pennings et al. for mtDNA and using the R package Diversity
for microsatellites. [...] Phylogenetic relationships within Engraulidae were based on a fragment of the mitochondrial cyt b gene (121 taxa, corresponding to 55 Engraulidae species; 1044 bp). At least one representative species of Engraulidae per genus (except Papuengraulis) and nine randomly chosen specimens from each of the five currently recognized OWA species were included in the phylogenetic analyses, with the exception of E. capensis from which only four specimens were available (accession numbers in Supplementary Table ). According to Lavoué et al., we selected the following outgroup species: Chirocentrus dorab, Clupea harengus, Denticeps clupeoides, Ilisha africana, Sardina pilchardus, Sundasalanx mekongensis. The Akaike Information Criterion (AIC) implemented in **Modeltest** 3.7, selected the GTR+I+Γ as the evolutionary model that best-fitted the data set. The inferred parameters were used in maximum likelihood (ML) and Bayesian Inference (BI) analyses. BI analyses were conducted with **MrBayes** 3.2.1. Metropolis-coupled Markov chain Monte Carlo (MCMC) analyses were ran for 20,000,000 generations with sample frequency of 2000. Final trees were calculated after a burnin of 1,000 generations. **PhyML** 3.0 was used to estimate the ML tree and to test by non-parametric bootstrapping the robustness of the inferred trees using 1,000 pseudoreplicates.Previous work, recovered E. encrasicolus as paraphyletic (specimens assigned to the species grouped into two clades that did not group together). To test if natural selection could interfere with phylogenetic inference, we performed all phylogenetic analyses using the above data set that only included individuals that do not show any evidence of being under selection and repeated the procedures using another data set (116 taxa; 1044 bp) that included individuals presenting a mutation in codon 368 of the cyt b as identified in Silva et al..To estimate the OWA origin and date lineage-splitting events within Engraulidae we used a Bayesian relaxed molecular-clock approach as implemented in **Beast** 2.1.3 based on a concatenated dataset of four partial fragments of mtDNA (cyt b: 1131 bp; 16S: 800 bp) and nuclear (RAG1: 1480 bp; RAG2: 1221 bp) genes. We included sequence data of 49 Engraulidae lineages/taxa, likely representing 14 genera (out of 16: the monospecific Lycothrissa and Papuengraulis genera are not represented), from which 5 are OWA (accession numbers in Supplementary Table ). According to our ML and BI analyses, E. capensis and E. eurystole are conspecific with E. encrasicolus, and two clades (hereafter clade A and clade B) were recovered within the latter. Hence, to perform the dating analysis we only selected a single representative of E. encrasicolus from clade A and two from clade B, both without the mutation at codon 368. We used the BirthDeath model for the tree prior that assumes that at any point in time, every lineage undergoes speciation at rate λ or goes extinct at rate μ, and three calibration points. One refers to the earliest record of Engraulidae [6–12] million years (Ma) from the Miocene - Lower Pliocene of Cyprus. The second calibration corresponds to age estimated for the divergence between Anchovia clupeoides and A. macrolepidota [2.8–3.1] Ma due to the closure of the Panama seaway, . The third calibration corresponds to E. japonicus [2–0] Ma from Kokubu group, Japan. Calibrations using the two fossils were modelled with a lognormal distribution, where 95% of the prior weight fell within the geological interval in which each fossil was discovered. For the Engraulidae [12–6] Ma, the parameters of the lognormal calibration prior were: 95% interval: mean in real space: 1.4, offset: 6.0 and log stdev: 1.0. For E. japonicus [2–0] Ma, the parameters of the lognormal calibration prior were: 95% interval: mean in real space: 0.465, offset: 0 and log stdev: 1.0. For the divergence between Anchovia clupeoides and A. macrolepidota we used a calibration according to Lessios et al. where the closure of the isthmus of Panama occurred between 3.1–2.8 Ma. Lognormal calibration was set to: 95% interval: mean in real space: 0.071, offset: 2.8 and log stdev: 1.0. MCMC analyses were run for 20,000,000 generations with a sample frequency of 20,000, following a discarded burn-in of 2,000,000 steps. The convergence to the stationary distributions was confirmed by inspection of the MCMC samples using Tracer 1.6.ML, BI and dating analyses were performed on the R2C2 research group cluster facility provided by the IT Department of the University of Algarve. […]

## Pipeline specifications

Software tools | adegenet, Clustal W, Geneious, Arlequin, Genodive, MEGA, HapStar, ModelTest-NG, MrBayes, PhyML, BEAST |
---|---|

Applications | Phylogenetics, Population genetic analysis |

Organisms | Danio rerio |