Computational protocol: Founder events and pre-glacial divergences shape the genetic structure of European Collembola species

Similar protocols

Protocol publication

[…] Sequences were edited and corrected with Sequencher 4.10 (Gene Codes Corporation, USA), coding nucleotide sequences (H3 and COI) were translated into protein sequences using the standard and invertebrate mitochondrial codes implemented in Sequencher. For each species, nucleotide and protein sequences, were aligned separately and combined (as concatenated nucleotide matrix of three genes) with Clustal W [] implemented in BioEdit 7.0.1 [].The best fit model of sequence evolution for each alignment (COI, 28S, H3, combined matrix) was inferred with to the hLRT in TOPALi v2.5 [] using the PHYML algorithm. Phylogenetic trees were calculated with RAxML v7.0.3 [] and MrBayes v3.1.2 []. For Maximum likelihood analyses the model of sequence evolution was GTR + I + G (all four alignments) and 10,000 bootstrap replicates were calculated. For Bayesian Inference lset parameters were nst = 6, rates = invgamma (all four alignments), the MCMC (Markov Chain Monte Carlo) chains were run for ten million generations that were sampled every 1,000th generation. For the 10,000 sampled generations a burnin of 2,500 was used, eliminating the first 25 % of the remaining generations. In the absence of fossil or biogeographic calibration points for the investigated species in Europe, a strict molecular clock for the COI nucleotide alignment was applied in BEAST v1.7.4 []. For constructing trees we used the Yule Process [] as preliminary analyses indicated quicker convergence and higher probabilities and likelihoods than coalescent tree priors. However, topologies with different tree priors did not vary. The Yule Process also is more appropriate for the genetically highly diverged lineages as the substitution rate among branches is more variable than with coalescent priors. We used the widely adopted mutation rate for COI in arthropods of 2.3 % pairwise sequence divergence per million years, corresponding to a rate of 0.0115 [, ].Convergence of the MCMC chain after 600 million generations (sampled every 60,000th generation) with a burnin of 25 % was confirmed using Tracer v1.4 []. Divergence estimates were calculated with three datasets: (1) all COI and 28S sequences of this study and NCBI combined, with strict clock settings for COI (fixed rate of 0.0115) and estimated rates for 28S, (2) all COI sequences obtained in this study with a strict clock (fixed rate of 0.0115), and (3) all COI sequences of this study extended with sequences from non-European countries obtained from NCBI and BOLD databanks with a strict clock (fixed rate of 0.0115). The extended datasets (2) and (3) included additional sequences from Antarctica, Australia, Canada, Chile, New Zealand, South Africa and northern France and a more detailed outgroup sampling for better estimation of the substitution rate. Outgroups covered additional species within the respective genera and families, i.e. Hypogastrura (five species), Xenylla grisea and Gomphiocephalus hodgsoni for Hypogastruridae and Folsomia (three species), Parisotoma notabilis, Anurophorus septentrionalis and Isotoma (three species) for Isotomidae (Additional file : Table S3). For F. quadrioculata and I. minor, non-European 28S sequences of the D3-D5 region were not available and the combined datasets were extended only with the above mentioned outgroup taxa. The outgroup settings of the two isotomid species were identical, except for five additional sequences of the parthenogenetic species Parisotoma notabilis, to account for variance in the substitution rate due to the reproductive mode of I. minor.Median-joining haplotype networks for the nucleotide datasets of COI, H3, 28S and the concatenated dataset were generated for all three species using the program Network 4.6 (Fluxus Technology, Suffolk, Great Britain). Molecular variance (AMOVA) within and between populations and isolation by distance (Mantel test) of all three genes (uncorrected p-distances) were analyzed separately in ARLEQUIN [] with 20,000 permutations. To infer the molecular divergence times of C. denticulata, F. quadrioculata and I. minor, we generated a phylogenetic tree of 10 families with 23 genera and eight outgroup taxa including the taxa studied by D’Haese [], using 28 COI sequences available at NCBI and a strict molecular clock in BEAST as described above. […]

Pipeline specifications

Software tools Sequencher, Clustal W, BioEdit, TOPALi, PhyML, RAxML, MrBayes, BEAST, Arlequin
Applications Phylogenetics, Population genetic analysis
Diseases Pulmonary Fibrosis