Computational protocol: Positive Selection of Deleterious Alleles through Interaction with a Sex-Ratio Suppressor Gene in African Buffalo: A Plausible New Mechanism for a High Frequency Anomaly

Similar protocols

Protocol publication

[…] Multilocus estimates of observed heterozygosity (ML-H o, equation 2.13 in ), expected heterozygosity (ML-H e, equation 2.12 in ) and allele frequency were first calculated per locus and subsequently averaged across loci. Standard errors of these multilocus indices were estimated by bootstrapping individuals (1000 samplings), using Excel 2010 , and rescaled by multiplying by the ratio of original to bootstrap mean (only a small adjustment as all bootstrap means were close to the original mean). Ninety-five percent confidence intervals of single-locus estimates of H e (PL-H e) were estimated with Arlequin 2.000 , .We performed population differentiation tests by permuting genotypes among groups, and tests for Hardy-Weinberg equilibrium by permuting alleles among individuals within groups (herds, subpopulations or other groups of individuals), using FSTAT 2.93 , . Analyses of Wright's F-statistics according to Weir and Cockerham's method , using FSTAT 2.93, and pairwise relatedness (Lynch and Ritland's r ) per sex or age group, using Spagedi 1.4 , were conducted with their standard errors being estimated by jackknifing across loci. Average opposite-sex relatedness per herd and its standard error were calculated with Genaiex 6.5 .Missing genotypes are expected to bias per-individual estimates of ML-H o and number of majority alleles (majority alleles: see section). To minimize this bias, missing single-locus data were replaced by single-locus estimates of mean observed heterozygosity (PL-H o) and mean number of majority alleles for northern or southern Kruger. Replacing missing data with average values does not bias mean values per individual and results in conservative P-value estimates (Online in ).Different alleles at a single locus and different loci in LD cannot be considered as independent data points in statistical analyses, which, if uncorrected for, results in pseudo-replication. To avoid this type of pseudo-replication, most probabilities were obtained by randomizing complete individual multilocus genotypes among groups of individuals (denoted as P randomization), using Excel 2010. Randomization at the level of individuals cancelled out pseudo-replication at the lower levels (i.e. alleles and loci), because each randomized data set was influenced by the pseudo-replication to the same extent as the original data set. Probabilities were estimated as twice (in order to arrive at a two-sided P-value) the fraction of random data sets showing the same or larger value (smaller in case of a negative) of a statistical parameter (mean, adjusted R 2, Spearman correlation coefficient, χ2-value) than the original data set, using 100,000 randomizations.Close relatedness within herds results in pseudo-replication of genotypes, because genotypes of related individuals are not independent of each other. We controlled for this type of pseudo-replication by generating null models, using Excel 2010, whereby observed and expected counts of alleles were first calculated per herd and then summed across herds. The expected counts assumed equal allele frequencies among classes (e.g. LBC and HBC) within herds, while allowing for allele frequency differences among herds, as well as among classes at the population level because herds were weighted differently for each class in relation to the counts per herd. Allele frequencies based on the summed counts, which essentially are frequencies weighted by sample size per herd, were used for calculating differences in statistical parameters between observed and expected counts (similar to a χ2 test). Probabilities were estimated by randomizations of complete multilocus genotypes per herd, using Excel 2010 (100,000 samplings). Herds with only one class were excluded from the calculations because these cannot be permutated. Randomization per herd cancelled out pseudo-replication of genotypes within herds, because herd affiliation remained unchanged and therefore each randomized data set was influenced by the pseudo-replication to the same extent as the original data set. Randomization per herd also corrected for population stratification due to herd structure, because it analysed differences between the original data, which reflected differences both within and among herds, and the null models, which reflected only differences among herds. In logistic regression analyses, which were performed with SPSS 19, correction for population stratification was performed by including the latitude of each herd. We chose to include latitude as a continuous variable rather than herd affiliation as a mixed random factor, because allele clines running in a north-south direction were observed for most of the common alleles (see section).HFCs were analysed by logistic regressions involving body condition (LBC and HBC), disease status (BTB-positive and BTB-negative), ML-H o, age, sex, and latitude per herd. Additionally, HFCs were studied by analysing the variation of pairwise group (LBC vs. HBC) differences in PL-H e among loci. The role of majority alleles (see section) in HFCs was analysed by logistic regression involving body condition (LBC and HBC), number of homozygous majority alleles per individual, age, sex and latitude per herd. Additionally, χ2 tests were performed for associations between body condition and genotype class (majority allele homozygotes, majority allele heterozygotes, homozygotes with no majority allele, heterozygotes with no majority allele). The role of sexual antagonism in HFCs was analysed by sex-specific LBC-HBC group differences in ML-H e and correlations between sexes in LBC-HBC allele frequency differences. The latter correlations were analysed after exclusion of rare alleles with a frequency lower than 0.05 to prevent low sample size bias.We performed a large number of tests in analysing HFCs, majority alleles and sexual antagonism, some of which are only presented as supporting information. To correct for the large number of tests we calculated Holm-Bonferroni corrected P-values using Holm's weighted procedure . The details of these tests together with the corrected P-values are provided in . The results presented in this study are significant after Holm-Bonferroni correction (corrected P<0.05), except for the logistic regression between body condition status and ML-H o.Two types of statistical test were performed to show the occurrence of positive selection: associations between allele frequency and effect size, with a positive correlation being indicative of positive selection, and correlations between allele frequency and latitude (allele clines). A χ2 test was used to test for an association between haplotype status (presence – absence of haplotype 557) and number of homozygous majority alleles.Baseline values of PL-H e and allele frequency per group (e.g. northern Kruger and southern Kruger) or pair of groups (e.g. LBC vs. HBC and BTB-positive vs. BTB-negative) were obtained by pooling individuals from the group(s) involved. Error bars in figures that represent 95% confidence intervals of binomial proportions were estimated according to Wilson . Stouffer's Z-test was used for combining probabilities from different independent tests . Forward and backward selection in multiple logistic regression analyses always resulted in the same model. Unless otherwise indicated, P-values are two-sided and α-levels are 0.05. Means in the text are reported together with their 95% confidence interval.The raw data set supporting the results of this article is available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.23d13 (http://datadryad.org/). […]

Pipeline specifications

Software tools Arlequin, SPAGeDi
Applications Phylogenetics, Population genetic analysis
Organisms Syncerus caffer, Homo sapiens
Diseases Oculocerebrorenal Syndrome