Computational protocol: Contemporary gene flow between wild An. gambiae s.s. and An. arabiensis

Similar protocols

Protocol publication

[…] Genotype calls were made with Beadstudio v3.2 (Illumina Inc.) with all calls checked manually. Although predominantly female samples were used (199 female An. gambiae, seven An. arabiensis from Jinja, eight An. arabiensis from Tororo and 11 hybrids), six An. arabiensis samples from Jinja (of 13) were males. Since both males and females were studied, X-chromosome SNPs were excluded from the analysis. From a total of 736 reliably scoreable SNPs on the array [, ], 462 autosomal SNPs were identified that were polymorphic and exhibited ≤20% missing data in any sample group (each species and hybrids). These 462 SNPs were used for all analyses (Additional file : Figure S1). FST and diversity statistics for each SNP were calculated from genotypes of PCR diagnostically-pure species using GenAlEx 6.5 [], and the distance among individual multilocus genotypes visualised using principal coordinates analysis (PCoA), also using GenAlEx 6.5 [], with default settings. Individual multilocus genotypes comprising of SNPs on chromosome 3 and chromosome arm 2R (see Results) served as input for STRUCTURE 2.3.4 [] and BAPS 6 clustering and genomic admixture analyses [, ]. Though normally applied as alternatives, these two methods were used together because STRUCTURE provides estimated admixture proportions for every individual, whereas BAPS only provides admixture proportions if some evidence of mixture is detected (otherwise a zero is returned) but also provides a probability for a hypothesis of no admixture. The admixture algorithm first estimates which multilocus genotypes show evidence of mixture and the proportion of the genome attributed to each source population, followed by simulation of multilocus genotypes from allele frequencies to determine the posterior probability that putatively mixed genotypes could be found in the source population [, ]. For STRUCTURE, admixture was estimated from the mean of ten replicates with 10,000 iterations for burn-in and 20,000 for data-collection, with k set to two in every run (to capture each species’ samples: STRUCTURE was not applied to determine the optimum number of clusters). In BAPS, multiple runs with k set from 2 to 20 were undertaken to obtain optimum clustering solutions. Settings for the admixture analysis were 100 iterations, 200 or 1000 reference individuals for simulations (see below) for observed data, and 20 iterations for the reference individuals. Since ‘pure’ species determined by single-locus diagnostics might actually be mixed genotypes, we computed an outlier analysis for each set of ‘pure’ species data. Using the proportionate mixture estimates from all data from STRUCTURE, we calculated the absolute deviation from the grand median and multiplied by a constant (b = 1.4826) representing the normal distribution to yield a median absolute deviation metric (MAD). Outliers were considered as data points whose mixture value was more extreme than 3 × MAD (in the direction of the alternate species, which represents a conservative threshold []. This method has the advantage over those utilising means and standard deviations of being relatively insensitive to the influence of any outliers in the detection process []; calculations were performed in Excel. BAPS admixture analysis was then performed using An. arabiensis and An. gambiae, following exclusion of outliers, as predefined populations and the outliers and hybrids as the test samples.Simulations of expected mixture proportions for various classes of hybrid were conducted in Hybridlab []. Observed genotype data for the ‘pure’ species samples (i.e. excluding outliers) was first used to generate 100 simulated genotypes of each, which served as the data for production of F1, F2, F3 and first to third generation backcrosses. 100 simulated genotypes were produced for each hybrid class for admixture analysis in BAPS with the simulated ‘pure’ species genotypes as predefined reference populations. To evaluate detection power for each hybrid class we calculated the percentage of significantly mixed individuals, mean admixture proportion, and its deviation from the relevant theoretical expectation: 0.5 for F1, F2, F3; 0.25 for first generation backcrosses (bx1), 0.125 for bx2, and 0.0625 for bx3. Admixture proportions of significantly mixed observed genotypes falling with the range of simulated values were considered potentially representative of the hybrid class. Thus genotypes could in some cases be considered a potential member of multiple classes, in which case their precise hybrid class status could not be determined. […]

Pipeline specifications

Software tools GenAlEx, BAPS, hybridlab
Applications Phylogenetics, Population genetic analysis
Organisms Anopheles gambiae
Diseases Malaria