Computational protocol: Similar patterns of linkage disequilibrium and nucleotide diversity in native and introduced populations of the pea aphid, Acyrthosiphon pisum

Similar protocols

Protocol publication

[…] Sequences were assembled using CodonCode Aligner (CodonCode Corporation, Dedham, MA). Because these were natural pea aphid isolates, all loci had heterozygous positions and some had heterozygous insertions or deletions (indels). The software identified heterozygous positions that were confirmed by visual inspection of the sequence trace file. In some cases we did not sequence across the entire amplicon due to the presence of heterozygous indels. These areas were treated as unknown data. There were also instances where we were unable to amplify a locus in a given genotype. Figure illustrates the gaps in our data for each of the four loci. All indels were excluded from analyses of nucleotide diversity. Sequences have been deposited in GenBank under accession numbers FJ825706–FJ825751 and FJ858381–FJ858708.Analyses of LD were performed using the Genetics package [] in the R statistical computing environment []. Our data are genotypic, not haplotypic data. Thus, for two loci, A and B, AB/ab is indistinguishable from Ab/aB. The frequency of AB, pAB, in this package is estimated via maximum likelihood, and then used to compute the difference in expected versus observed allele pairs, D, where D = pAB - pApB and the correlation coefficient between the pairs, r, where r = -D/(√(pApapBpb)). We thus report LD values between pairs of sites using the squared allele frequency correlation measure, r2. Rare sites (where only one individual differed from the others) were excluded prior to computations. Significance of LD between SNPs was assessed using Fisher's exact test, also implemented using the Genetics package. To estimate the decay of LD over distance, we used nonlinear regression of LD between polymorphic sites versus physical distance []. The expected value of r2 using a drift-recombination model is E(r2) = 1/(1 + C), where C equals the population recombination parameter (C = 4Ner). Adding mutation to the model, the expectation becomes:E(r2) = ((10 + C)/(2+C)(11+C))/(1+(3+C)(12+12C+C2)/(n(2+C)(11+C)), [] where n is the number of sequences sampled. We fit this model after replacing C by C*distance in base pairs between sites using PROC NLIN in SAS [].We calculated π, θ, and Tajima's D using R. π measures nucleotide diversity as the average number of nucleotide differences per site between all pairwise comparisons of sequences and θ is the level of heterozygosity determined by the number of variable positions [,]. Tajima's D is a measure of whether the frequency distribution of segregating sites at a locus is consistent with neutral expectations []. Levels of population differentiation (Fst) for each of the four loci were estimated using Genepop [], which uses a weighted analysis of variance []. We also used Genepop to determine whether any of the loci were not in Hardy-Weinberg equilibrium, testing the null hypothesis of random union of gametes using the exact Hardy-Weinberg test []. […]

Pipeline specifications

Software tools CodonCode Aligner, Genepop
Application Population genetic analysis
Organisms Acyrthosiphon pisum