Computational protocol: Selection on the Drosophila seminal fluid protein Acp62F

Similar protocols

Protocol publication

[…] We obtained nucleotide sequences of Acp62F from D. melanogaster, and orthologous sequences from Drosophila simulans, Drosophila sechellia, Drosophila yakuba, and Drosophila erecta from FlyBase (FlyBase IDs: FBgn0020509, FBgn0043400, FBgn0069552, FBgn0107140, FBgn0237654). Translated sequences were aligned using Muscle (Edgar ) and back translated to nucleotide sequences using T-COFFEE (Notredame et al. ). The resulting multiple sequence alignment is included in Appendix .We used three methods to infer recurrent positive selection on Acp62F. The first method implements the “sites” models in codeml, part of the PAML package (Yang ). Findlay et al. () conducted the same analysis; we replicate it here to confirm their results. The sites models allow variation in ω (dN/dS) between different codons within a gene, but assume that all lineages experience the same distribution of ω. Two null models were used, M7 and M8A (Yang et al. ), each of which restricts ω to be less than or equal to one, thus disallowing positive selection. M7 assumes a beta-distribution for ω and M8A assumes a beta-distribution as well as an extra class of sites in which ω = 1. The alternative model M8 assumes a beta-distribution for ω (restricted to be ≤1), but adds a class of sites with ω ≥ 1. M8 can be compared with either null model via a likelihood ratio test (with 2 and 1 df for M7 and M8A, respectively), and a significant rejection of the null model is evidence in favor of positive selection on a subset of codons.We additionally used the random-effects likelihood (REL) and fixed-effects likelihood (FEL) methods of Kosakovsky Pond and Frost () as more robust tests for positive selection on Acp62F. REL and FEL analyses were performed on the hypothesis testing using phylogenies datamonkey server (Pond and Frost ). These methods allow variation in both dN and dS, whereas the models implemented in PAML assume a single value of dS. REL assumes predefined distributions for dN and dS, and after initial inference of parameter values uses an empirical Bayes approach to infer selection on each site. As such, like PAML, the REL analysis assigns to each codon a posterior probability that it is under positive selection, with higher posterior probabilities indicating greater confidence that selection operates on the given codon. FEL, in contrast, directly estimates dN and dS at each site. Simulation results suggest that REL may be subject to higher false-positive rates for alignments with few species, such as the five-species alignment used here, whereas FEL does not seem to suffer from this problem (Kosakovsky Pond and Frost ). For each codon, FEL estimates the probability of obtaining the observed dN and dS values under a neutral model (i.e., a P-value under neutrality), with lower P-values indicating rejection of neutrality in favor of positive selection. […]

Pipeline specifications

Software tools MUSCLE, T-Coffee, PAML, Datamonkey
Databases FlyBase
Applications Phylogenetics, Population genetic analysis, Nucleotide sequence alignment
Organisms Drosophila melanogaster