Computational protocol: Evolutionary study of duplications of the miRNA machinery in aphids associated with striking rate acceleration and changes in expression profiles

Similar protocols

Protocol publication

[…] Chromatograms were analyzed and assembled using the Staden Package 2.0b []. Nucleotide sequences were aligned with Clustal X 2.0.9 [] and corresponding amino acid alignments were obtained with MEGA 4.0.2 [].Phylogenetic trees were inferred from nucleotide and amino acid alignments by maximum likelihood, maximum parsimony and Bayesian inference. The analyses of ago-1 were done separately for each of the three regions as well as for concatenated alignments. The phylogenetic analyses on nucleotide sequences were only carried out on coding regions. The model of sequence evolution for maximum likelihood analyses was chosen using jModeltest [] for nucleotides and ProtTest [,] for amino acids. Maximum likelihood reconstructions were obtained with PhyML [] using the NNi algorithm. PAUP* 4.0b10 [] was chosen for maximum parsimony analyses with TBR branch swapping and 5000 repetitions of random sequence addition. Statistical support to nodes was evaluated for maximum likelihood and maximum parsimony by the bootstrap method [] with 200 and 1000 pseudorreplicates respectively. The Bayesian inference of phylogeny was carried out as implemented in MrBayes 3.1 [,]. Two parallel runs, each one consisting of three cold and one heated chains were set. 106 and 5 × 105 generations for nucleotides and amino acids respectively were enough for reaching convergence between the runs, which was checked using Tracer v1.5 []. A burn-in fraction of the initial 25% generations was eliminated and posterior probabilities of trees were obtained by sampling every 100th generation afterwards. [...] SH tests [] and ELW tests [] were implemented in TREE-PUZZLE to compare among alternative scenarios concerning the phylogenetic timing of the duplication of dcr-1 in aphids. Seven different hypotheses were simultaneously tested in each test. The seven hypotheses were constructed by permutation of the groups branched to the oldest nodes in the maximum likelihood tree obtained from the amino acid alignment of dcr-1. [...] The nature and distribution of the selective pressures acting on the gene duplicates of ago-1 and dcr-1 was evaluated using PAML 4.4 [] on nucleotide coding regions. The analyses were carried out separately for each region of ago-1 and for dcr-1, and outgroup sequences were excluded. Branch models were implemented to test the hypothesis of different ratios of non-synonymous to synonymous substitution rates (ω=dN/dS) acting on the -1a and -1b copies of these genes. Likelihood ratio tests (LRTs) were used to compare among: i) a “one-ratio” model that assumed no difference in the ratio across the phylogeny, ii) a “two-ratio” model that fitted one ratio for the fast evolving copies (−1b copies) and a different ratio for the slow evolving copies, and iii) a “free-ratio” model, that assumed the existence of a different ratio for each branch of the tree [,]. In the “two-ratio” model for ago-1, the ago-1b sequences were set as fast evolving copies and the ago-1a sequences as slow evolving copies. For dcr-1, the dcr-1b sequences were set as fast evolving copies and the rest of aphid dcr-1 sequences as slow evolving copies (including the dcr-1a copies of Acyrthosiphon). Site-models were also used for the search of positively selected codon positions in the alignments. Two models were compared by an LRT test: model M7, assuming no positively selected sites in the alignment, and model M8, which allows for their existence [,]. The search for positively selected sites with these site models was made only for the -1b copies of the alignments. Finally, branch-site models were also implemented, comparing by an LRT the null model MA with ω fixed to 1 and the alternative model MA with ω estimated [,]. For branch-site models, both copies were included, labeling the clade of -1b copies as foreground branches, where positively selected sites are analyzed. All models were implemented allowing the four nucleotide frequencies to vary among codon positions (model F3X4), which gave significantly better likelihood values than not allowing variation (model F1X4). […]

Pipeline specifications

Software tools Clustal W, MEGA, jModelTest, ProtTest, PhyML, MrBayes, TREE-PUZZLE, PAML
Application Phylogenetics
Organisms Acyrthosiphon pisum