Computational protocol: Evidence for positive selection in the gene fruitless in Anastrepha fruit flies

Similar protocols

Protocol publication

[…] DNA was extracted following the modified protocol of [], in which the exoskeletons were maintained intact for future morphological analyses. We amplified a region from the end of the BTB domain to next the end of first exon of the connecting region of fru (C3 exon) (Figure ), using degenerate primers created from homologous sequences of closely related species: (5'-AGTTCGCTGCCGATGTTYCTCAA-3' and 5'-GACAGRCACTAYCCGCAGGACTCTCAG-3'). This region was amplified by PCR from genomic DNA in a thermocycler PTC-200 (BioRad) using an admixture of Taq polymerase and Pfu polymerase to reduce incorporating errors []. PCR products were purified by PEG 8000 precipitation [] and cloned with InsTAclone kit (Fermentas). At least two recombinant colonies were sequenced with forward and reverse M13 primers using the DYEnamic™ET dye terminator kit (GE Healthcare) and resolved either in a MegaBace 1000 (GE Healthcare) or in an ABI 3730 (Applied). DNA sequencing was mostly carried out at MACROGEN INC, Korea. Quality of base-calling was visually inspected in Chromas version 2.31 The GenBank accession numbers for the 97 sequences from A. fraterculus, A. sororcula and A. obliqua are [HQ003715 - HQ003811. We used the translation of these Anastrepha exon sequences to proteins to align this region to more distantly related taxa. The protein alignments were used as reference to correct alignments of nucleotide sequences, which were used in the phylogenetic tree estimation. Sequences were aligned and visually inspected using Clustal W in BioEdit Sequence Alignment Editor software []. When sequenced clones from an individual differed by less than 3 mutations, additional recombinant colonies (up to five total) were sequenced to confirm results. [...] To reconstruct the phylogenetic tree an optimal nucleotide substitution model was determined by Akaike information criterion (AIC) using MODELTEST ver. 3.7 [] implemented in the HyPHy package ver 0.95 beta []. A phylogenetic tree using sequences from fru C3 exon was estimated by maximum likelihood using the software PhyML ver. 3.0 [] under the TIM+G nucleotide substitution model estimated previously. For the phylogenetic tree reconstruction we used one sequence from each of the Anastrepha species studied in this work (A. fraterculus [GenBank accession number: HQ003715], A. sororcula [GenBank accession number: 1376936] and A. obliqua [GenBank accession number: HQ003765]) to represent the Tephritidae lineage, and 10 different sequences from Drosophilidae available in GenBank: D. simulans - [AF297054.1]; D. sechellia - [AF297055.1]; D. melanogaster - [D84437.1]; D. erecta - [AF298222.1]; D. yakuba - [AF297056.1]; D. heteroneura - [AF051668.1]; D. silvestris - [AF051665.1]; D. grimshawi - [AF105124.1]; D. virilis - [AY028967.1] and D. pseudoobscura - [AF297059.1]. [...] In order to investigate selective pressures that modeled the evolution of fru, we performed a relaxed branch-site test and a strict branch-site test [,] using the software CODEML, implemented in PAML ver. 4 []. The nonsynonymous/synonymous substitutions rate ratios (dN/dS = ω) were measured to infer the selective pressure at the protein level. A ω > 1 at a specific site would indicate positive selection, because nonsynonymous substitutions would have higher fixation probabilities than synonymous mutations due to selective advantages. On the other hand, a ω < 1 would indicate purifying selection, caused by selective constraints at the codon position. The branch-site tests consider the phylogenetic tree to test different selective scenarios. The phylogenetic tree was separated in foreground branches, at which positive selection is tested, and background branches, represented by the other lineages. We established the Anastrepha branch as foreground and the other Drosophila species as background. Positive selection was inferred by a contrast of hypotheses using a maximum likelihood approach. Both relaxed branch-site test and strict branch-site test use the MA model as alternative model, in which the codons in the foreground were allowed to have ω > 1, and the codons in background were constrained to ω ≤ 1. The relaxed branch-site test null model (M1a) assumes the same evolutionary rates for all sites and branches, with all sites varying ω values from 0 and 1. On the other hand, the strict branch-site test fixes the ω > 1 category to 1 in the null model (restricted MA), that is, all sites with ω > 1 are forced to evolve neutrally (ω = 1). Positive selection is inferred when a log likelihood ratio test (LRT) of these values results in a significant value. To assess significance of the LRT we used a chi-squared null distribution composed of a mixture of point mass 0 and χ12. Under such null distribution the critical values at 0.05 and 0.01 levels were 2.71 and 5.41, respectively []. Bayes Empirical Bayes [] method was used in conjunction with the branch-site test to estimate which sites were under the influence of positive selection. We used the MA model parameters to estimate the Bayes Empirical Bayes posterior probabilities. Because some models are prone to show a problem of lack of convergence in a likelihood framework, we ran the analyses twice with different initial ω values.Positive selection was also investigated through the MM01 method of McClellan et al. [] which evaluates whether nonsynonymous substitutions favored or not structural or functional changes in the protein. The analyses were carried out in TreeSAAP version 3.2 [,,] and considered the changes in many physicochemical properties brought forth by each nonsynonymous substitution. A global deviation from neutrality is verified by a goodness-of-fit test between a neutral expected distribution and the observed distribution of the selected physicochemical properties []. Furthermore, TreeSAAP also separates the magnitude of nonsynonymous changes in a range going from conservative to very radical substitutions, according to the change in specific physicochemical properties. The lowest classes (1 to 3) represent the more conservative changes and the highest classes (6 to 8) represent the more radical changes []. McClellan et al. [] conservatively defined stabilizing selection as a selection that tends to maintain the original biochemical attributes of the protein, despite the fact of the inference of positive selection, and destabilizing selection as a selection that favors structural and functional shifts in a region of a protein. In this way, positive-destabilizing selection represents a signature of molecular adaptation.We considered 8 categories of magnitude change for the analysis and followed the categorization given in McClellan et al. []. Only amino acid properties identified by significant positive z-score in the magnitude categories 6, 7 or 8 were considered to be affected by positive-destabilizing selection. To verify which specific regions were affected by positive-destabilizing selection, we performed a sliding window analysis using the amino acid properties which were significant for this type of change. Sliding windows of 20 amino acid length with a sliding step of one codon were selected for showing the best signal-to-noise ratio []. [...] General diversity indexes, such as haplotype (Hd) and nucleotide (π) diversity [], number of polymorphic sites, synonymous nucleotide diversity (πs) and nonsynonymous nucleotide diversity (πa) were calculated using DnaSP version 5 []. The significance of comparison among diversity indexes were accessed by a two sample z-test for comparison between means []. fru C3 exon and I3 intron haplotype networks were inferred by TCS v 1.21 software [] using statistical parsimony with a 95% connection significance [], and was manually converted to Newick tree format for some of the neutrality tests. Because recombination interferes with phylogenetic inferences, we performed three different methods to detect recombination events: GENECONV [] and RDP both implemented in RDP version 3b14 []. Tajima´s D [], Fu and Li´s D and F [] and Fay and Wu's H [] neutrality tests were performed in DnaSP version 5 [] using a sequence from Drosophila melanogaster (GenBank access number: D84437.1) as an outgroup for the fru C3 exon analysis and from Ceratitis capitata (GenBank access number: AF124047.1) for intron analyses. We used different outgroup sequences because the C. capitata sequence available on GenBank included only a small portion of the C3 exon, and the D. melanogaster intron showed higher divergence in relation to the sequences obtained in this work, and, as a consequence, its correct alignment was impaired.One test commonly used to analyze populational data is the McDonnald-Kreitman's test [], which contrasts fixation and polymorphism levels of synonymous and non-synonymous substitutions for two species in a contingency table []. Our data cannot be subjected to McDonnald-Kreitman test since there is no fixed interspecific variation for any of the species considered. We may, however, use a modification of the test proposed by Templeton [] which implements the contrast of synonymous and nonsynonymous substitutions in tip and interior haplotypes (populational equivalents to the young and old haplotypes contrast of McDonnald-Kreitman test, respectively). Under neutrality synonymous and nonsynonymous substitutions are expected to occur in a same rate both in tip and interior haplotypes, whereas at purifying or positive selection such rate are biased toward synonymous or nonsynonymous substitutions, respectively. We propose here an alternative test of selection which evaluates whether synonymous substitutions have a greater probability of surviving in the population when compared to nonsynonymous substitutions. In a neutrally evolving region, we expect that the probabilities of survival of synonymous and nonsynonymous substitutions through time should come from the same probability distribution. On the other hand, if the region is under purifying selection, we expect that nonsynonymous substitutions have a higher chance of being eliminated [], whereas if the region is under positive selection, we would find several advantageous nonsynonymous substitutions with higher probability of surviving and spreading in the population []. In order to detect this pattern in selection signals, we evaluate the number of haplotypes that derived from each mutation directly from a haplotype network. Because recent mutations, such as those present as singletons or doubletons in the tips of the haplotype network, may not yet have passed through the evolutionary test of survival and reproduction over time [] and are more affected by drift and chance, we only consider mutations present in at least three haplotypes. To contrast the differential survival probabilities of synonymous and nonsynonymous substitutions in the population, we ranked internal synonymous and nonsynonymous mutations in the haplotype network according to the number of descendent haplotypes derived from them and calculate the difference in synonymous and nonsynonymous ranks summation (R1 and R2) in an improved normal approximation to the Mann-Whitney test []. If synonymous and nonsynonymous mutations come from the same distribution, we do not expect to see a significant difference in their ranks, i.e., in neutrality, we expect that synonymous and nonsynonymous mutations should have similar probabilities of survival in a population, and therefore, similar ranks. On the other hand, if nonsynonymous mutations have been selected against, we expect that synonymous mutations would be on average older, and therefore have higher ranks than nonsynonymous mutations, and the opposite would be true if the region is under positive selection. Because it has been shown that tests of selection that rely on the comparison of rates of nonsynonymous to synonymous substitutions have limited power when contrasting sequences with little differentiation [,], this test is adequate to look for patterns of selection at the population level by using information derived from the phylogenetic relationships amongst the haplotypes. […]

Pipeline specifications

Software tools Chromas, Clustal W, BioEdit, ModelTest-NG, HyPhy, PhyML, PAML, DnaSP
Applications Phylogenetics, Sanger sequencing
Organisms Drosophila melanogaster