Computational protocol: Molecular Evolution of Immune Genes in the Malaria Mosquito Anopheles gambiae

Similar protocols

Protocol publication

[…] Nucleotide diversity (π) was estimated using DnaSp 4.10 . The 95% confidence interval (CI) of π was estimated using bootstrapping over positions in programs written in SAS (SAS Institute Inc., 1990). To evaluate if recombination rate differed between genes and determined their diversity the recombination parameter (R = 4Nr) between adjacent nucleotide positions for each gene was estimated using DnaSp. A more complete summary of polymorphism was obtained by the site frequency spectra , , which describes the frequency of sites that are invariant (f = 0), singleton (f = 1), and polymorphic (f = 2, 3, … n/2), where f is the frequency of the rare nucleotide at this site/position and n is the number of sequences. These spectra distinguish between rare (e.g., singletons) and common mutations (sites where the rarest nucleotide was observed 4–7 times, which is the maximum possible frequency given 9–14 sequences per population). The frequency of neutral mutations increases slowly compared with positively selected mutations but faster than deleterious mutations. Hence, rare mutations represent a greater fraction of new and mildly deleterious mutations, whereas common ones represent a greater fraction of ancient and neutral mutations. The site frequency spectrum is especially useful to compare polymorphism in different regions of a gene without bias due to PCR errors, because it accounts for sequence length variation. We compared and tested equality of nucleotide diversity of synonymous and nonsynonymous sites using bootstrapping in MEGA 3.1 .The Hudson, Kreitman and Aguadé's test (HKA test) compares within and between species divergence and polymorphism in two (or more) loci, accommodating different rate of neutral polymorphism between loci . This test was designed to detect positive and positive-balancing selection. It was performed using DnaSP. The McDonald and Kreitman's Test (1991) compares the ratios of fixed to polymorphic substitutions of nonsynonymous and silent (both synonymous and NC) substitutions between species. Under neutrality, fixation rate is expected to be equal, but positive selection would increase the rate of fixation in nonsynonymous sites. This test was performed using DnaSP.Differentiation between populations was assessed by sequence-based F statistics analogous to Wright F statistics , calculated according to and tested (for being greater than zero) by a permutation test using DnaSP. Confidence intervals around FST values were calculated by bootstrapping over nucleotide positions using programs written in SAS . To avoid the effect of unequal sample size due to pooling four An. gambiae populations compared with single populations of An. arabiensis and An. quadriannulatus, inter-species comparisons were performed using the population of An. gambiae from western Kenya, which is sympatric with An. arabiensis. The binomial test (which estimates the probability of obtaining the observed number of significant tests at the 0.05 level given the total number of tests) was used to detect significant departures from null hypothesis across multiple tests, such between pairwise population comparisons across genes.The evolutionary relationship between the sibling species is not fully resolved probably because introgression between An. gambiae and An. arabiensis affected genes unprotected by fixed inversions –. Because of uncertain phylogeny and introgression, we did not classify mutations as ancestral, shared, and derived and our selection analysis relied on within-gene comparisons. Comparisons between different functional regions of a gene (defined below) and synonymous vs. non-synonymous mutations provide robust evidence for selection and avoid confounding effects of population demography, inversion, introgression, and PCR errors because they affect all regions of the gene equally. Likewise, such comparison is not susceptible to variation in mutation and recombination rates between unlinked loci across the genome. This approach is conservative because polymorphism in shorter DNA fragments is subject to higher sampling variation, reducing the power to detect differences between regions. Physical linkage between adjacent regions may further reduce the differences between them even if selection operated on only one region. The advantage of this approach, however, is that significant differences represent robust evidence for selection.Test of positive selection on single codons was performed using the codeml program in the package PAML 3.15 . It estimates the per site ratio of nonsynonymous to synonymous substitutions in every codon along the branches of a phylogenetic tree by fitting nested maximum likelihood models with different parameters. Analyses were performed on coding regions of all homologue genes from the family Culicidae available in Genbank (searched using tblastx) and all unique sequences obtained in this study. GNBP alignment was 171 aa long and included eight species (An. gambiae, An. arabiensis, An. quadriannulatus, Ae. aegypti, Ae, albopictus, Ae. triseriatus, Cx.quinquefasciatus, and Armigeres subalpatus). SP14D1 alignment was 246 aa long and included six species (An. gambiae, An. arabiensis, An. quadriannulatus, Ae. aegypti, Cx.quinquefasciatus, and Ar. subalpatus). Gambicin alignment was 81 aa long and included nine species (An. gambiae, An. arabiensis, An. quadriannulatus, An. funestus, An. darlingi, Ae. aegypti, Cx.quinquefasciatus, Cx.pipens, and Ar. subalpatus). Defensin alignment was 101 aa long and included seven species (An. gambiae, An. arabiensis, An. quadriannulatus, An. funestus, An. darlingi, Ae. aegypti and Ar. subalpatus). Multiple alignment of coding regions was done using ClustalW followed by hand alignments before removal of all gaps. For GNBP and SP14D, pairwise local alignment were obtained in tblastx instead of Clustal and final alignment was performed manually in Genedoc (version 2.700). Neighbor Joining trees were produced using the program Neighbor (PHYLIP 3.66) based on a distance matrix computed by Dnadist (PHYLIP 3.66), run under default parameters . […]

Pipeline specifications

Software tools DnaSP, MEGA, HKA, PAML, TBLASTX, Clustal W, PHYLIP
Applications Phylogenetics, Population genetic analysis, Nucleotide sequence alignment
Organisms Anopheles gambiae, Homo sapiens, Anopheles arabiensis
Diseases Malaria