Computational protocol: Phylogeography of the Wheat Stem Sawfly, Cephus cinctus Norton (Hymenoptera: Cephidae): Implications for Pest Management

Similar protocols

Protocol publication

[…] For this phylogenetic study, we focused on seven specimens of C. cinctus, collected in Canada, Montana and Colorado, these samples corresponding to the most common haplotypes of the two lineages found across North America (see section on Phylogeography of North American Wheat Stem Sawflies).Polymerase chain reaction (PCR) amplifications were conducted for two mitochondrial gene fragments: a 264-bp region of the cytochrome c oxidase subunit I (COI) and a 442 bp of the ribosomal 16S RNA (16S). Additional details on primers and PCR conditions can be found in . The purified PCR products were sequenced in both directions by Genoscreen (Lille, France) using an ABI PRISM 377 DNA sequencer. Sequences were edited in CodonCode Aligner (www.codoncode.com) and multiple alignments were performed using CLUSTAL W [] as implemented in CodonCode. The constructed 16S and COI sequence datasets were merged with the homologous regions obtained from the complete mitochondrion sequencing of four Cephidae species that were obtained from NCBI (Calameuta idolon Genbank accession number: KT260168; Calameuta filiformis KT260167; Cephus pygmaeus KM377623 and Cephus sareptanus KM377624).To estimate divergence between each taxon, we used Mega version 7.0 [] to calculate the average and pairwise genetic p-distances within and among species on single gene data sets.Bayesian analysis was conducted using MrBayes 3.1.2 []. First, jModelTest2 [] was used to test for the best-fit model of sequence evolution for each gene. The best-fit model was selected using the corrected Akaike Information Criterion (AICc) []. Two simultaneous runs of 3 million generations were performed and convergence was maximized by ensuring that the average standard deviation of split frequencies fell below 0.01 and potential scale reduction factors approached 1.0. The first 25% of each run was discarded as burn-in phase for the estimation of the consensus topology and the computation of the posterior probability for each node.The Poisson tree processes (PTP) model for species delimitation [] was used to identify the most likely species number in Bayesian phylogeny of the combined dataset. This model estimates the speciation rate directly from the number of substitutions and does not require ultrametric trees as inputs. This method hence assumes that each substitution has a small probability of generating a speciation event. Consequently, the number of substitutions between species is expected to be significantly higher than within species. The analysis was conducted on the web server for PTP (available at http://species.h-its.org/ptp/) with 200,000 MCMC generations, a thinning value of 100 and a burn-in of 25%. As recommended by the developers, the convergence of the MCMC chain was confirmed visually []. [...] The COI mitochondrial region was partially amplified (762 bp) with C1-J-2183 and TL2-N-3014 pair of primers as described by Simon et al [] from 1 to 21 individuals per population. Among the 349 specimens, 270 were collected on wheat and 79 on wildland grasses. Additional details on samples and PCR reactions are provided in and respectively. All sequences were obtained in both forward and reverse senses, assembled into consensus contigs using CodonCode Aligner and then aligned using CLUSTAL W.Intraspecific phylogenetic relationships were analyzed in two different ways. First, we reconstructed by Bayesian analysis using MrBayes 3.1.2. The methodology was the same as previously described. An haplotype network was then constructed in PopART [] using TCS network (95% connection limit).For further analysis, sampling locations containing one or two individuals were pooled with the nearest sampling site or excluded when the sites were too much isolated (> 30km). The estimation of gene diversity (Hd) and nucleotide diversity (π) for each of the sampling locality was conducted with Arlequin 3.5 []. Demographic history changes were analyzed using two neutrality tests: Tajima’s D [] and Fu’s Fs []. These two frequency-based indicators of a population expansion (or selection in non-neutral markers) were calculated with Arlequin 3.5.A spatial analysis of molecular variance (SAMOVA) was used to investigate geographical structure with SAMOVA 1.0 []. This approach defines groups of populations that are geographically homogeneous and maximally differentiated. The program was run for two to ten differentiated groups (K = 2 to K = 10) using 10,000 permutations from 100 random initial conditions. Each group defined by SAMOVA was analyzed separately for its gene diversity (Hd) and nucleotide diversity (π). Allelic richness r was computed using the rarefaction method proposed by Petit et al [] with Contrib (http://www.pierroton.inra.fr/genetics/labo/Software/Contrib).Finally, a hierarchic analysis of molecular variance (AMOVA) was applied between collections from wildland grasses and wheat fields to test for host plant effect on the genetic structure of populations. This analysis was conducted with the software Arlequin 3.5. [...] Five microsatellite markers, described by Hartel et al [] were used to genotype 539 individuals from 36 sampling sites (). Additional details for genotyping protocols can be found in .Deviation from Hardy-Weinberg equilibrium and linkage disequilibrium between pairs of loci were tested with Genepop 4.2.1 []. Allelic richness (AR) with the rarefaction method, observed and expected heterozygosity (Ho and He) and inbreeding coefficients (Fis) were estimated using FSTAT 2.9.3.2 [].To explore the population structure within the whole dataset, we used the Bayesian clustering approach implemented in Structure 2.3.4 []. An admixture model with correlated allele frequencies was used and simulations were run with sampling location as prior because in situations of low levels of genetic divergence or a limited number of loci, this model allows a more accurate detection of genetic structure []. The burn-in period of each run was set to 100,000 followed by 100,000 MCMC iterations. We performed 10 independent runs for each value of K ranging from 1 to 8. We assessed the uppermost level of population structure by using the ΔK method [] implemented in Structure Harvester []. The graphical display of genetic structure was produced with Distruct [].Traditional methods may be useful for describing population structure and complementary to Bayesian methods [], we also used as alternative methods a Principal Coordinates Analysis (PCoA) calculated via covariance matrix on standardized data in GenAlEx v 6.5 [] and we built a population-based neighbor-joining (NJ) tree. The pairwise Cavalli-Sforza and Edwards’ chord distance measures [] were calculated in Population 1.2.32 software [] and the resulting distance matrix was used to build the NJ tree. The robustness of the nodes was evaluated by carrying out 1000 bootstrap replicates over loci. The NJ tree was visualized with Figtree v 1.4.2 [].The level of genetic differentiation among populations was quantified by the estimation of the pairwise FST. Estimation of FST values and their statistical significance was conducted using Arlequin 3.5. Isolation by distance (IBD) was examined within the whole dataset and separately for the groups detected with Structure. IBD was investigated by testing the correlation between pairwise FST and geographical distances using the Mantel test as implemented in Genepop. […]

Pipeline specifications