Computational protocol: Horizontal acquisition of multiple mitochondrial genes from a parasitic plant followed by gene conversion with host mitochondrial genes

Similar protocols

Protocol publication

[…] Sequence alignments were constructed with MUSCLE version 3.7 [] and manually adjusted when necessary using BioEdit version 7.0.9 []. For pseudogenes with frameshifting indels, the sequences were restored to their ancestral reading frames by comparison with functional gene copies from closely-related sequences. This was necessary for calculations of synonymous and non-synonymous sequence divergence. Poor-quality regions of the alignments were excluded using Gblocks version 0.91b [] with relaxed parameters including the minimum number of sequences for a flank position (b2) set to 50%, minimum block length (b4) set to 5, and maximum number of species with gaps (b5) set to 50%.For some analyses as indicated in the text, predicted sites of C-to-U RNA editing were eliminated by converting them to T in the data sets. In order to predict edit sites, data set sequences were first aligned to published cDNA sequences from Arabidopsis thaliana, Beta vulgaris, Citrullus lanatus, Vitis vinifera, Oenothera berteriana (for atp6 and matR) or O. biennis (for atp1) and Oryza sativa (for atp1 and atp6) or Zea mays (for matR). Edit sites were predicted in the data set sequences by comparison to the cDNA sequences using PREP-Aln [] with a cutoff score of 0.2, and all predicted sites were converted to T.Phylogenetic analyses were performed using the maximum likelihood (ML) approach as implemented in PhyML version 3.0 []. For each analysis, the general time reversible (GTR) substitution model and subtree pruning and regrafting (SPR) branch-swapping was used. A gamma distribution with four rate categories and the proportion of invariable sites were estimated during the analysis. Each analysis was run five times starting from different randomized tress. Support for the ML topology was evaluated by bootstrapping with 100 ML replicates.Pairwise levels of non-synonymous (dN) and synonymous (dS) divergence were calculated with MEGA version 4.0.2 []. The Nei-Gojobori method was used with a Jukes-Cantor correction for multiple hits and pairwise deletion of gaps. Standard errors for the pairwise estimates were calculated using the bootstrap method with 500 replicates. Edit site effects were eliminated from the analyses by coding all predicted sites as a T in the data sets. Effects of atp1 gene conversion were eliminated from the analysis by removing the affected codons from the data set.Recombination was detected by OnePop in the OrgConv package []. When the length of the detected recombinant segment was longer than 100 nucleotides, phylogenetic trees were reconstructed for both the recombinant region and the remaining sequence using PhyML as described above, and incongruence between the regions was examined using the approximately unbiased (AU) test []. Detected recombinant segments were required to have a P-value < 0.001 to be considered significant, and when longer than 100 nucleotides, the segment was required to have a P-value < 0.05 using the AU test. […]

Pipeline specifications