Computational protocol: Assessing Whether Alpha-Tubulin Sequences Are Suitable for Phylogenetic Reconstruction of Ciliophora with Insights into Its Evolution in Euplotids

Similar protocols

Protocol publication

[…] Sequence divergence between paralogs of ciliates is not clear. In the present investigation, we follows criterion of previous study , which defines sequences that diverge by more than 2% as paralogs, considering sequences errors produced by repeated PCRs and cloning . Under this approach, recent paralogs may be confounded with allelic diversity and some paralogs may be missed, but this should not substantially bias our interpretations.Five data sets were included in phylogenetic analyses: (1) Atub_n74: alpha-tubulin nucleotide sequences including first two codon positions (74 sequences in total); (2) Atub_aa: alpha-tubulin amino acid (70 sequences in total); (3) Atub-SSU: two-gene combined dataset including all euplotid species available (the paralog with shortest branch length is selected for alpha-tubulin) and other spirotrichean species of Dataset Atub_n74 except for Discocephalus ehrenbergi and Histriculus histrio for SSU-rDNA, and D. rotatorius and H. cavicola for alpha-tubulin (52 sequences in total); (4) SSU: SSU-rDNA sequences including all taxa in Dataset Atub-SSU (52 sequences in total); (5)Atub_n52: alpha-tubulin nucleotide sequences with first two codon positions including all taxa in Dataset Atub-SSU (52 sequences in total). For phylogenetic analyses, 27 sequences of alpha-tubulin genes from GenBank were used in addition to ones newly sequenced in the present study. The sequences were aligned using the ClustalW implemented in BIOEDIT 7.0.0 , and further modified manually using BIOEDIT. Final alignments used for subsequent phylogenetic analyses included 710 positions (Atub_n74), 355 positions (Atub_aa), 2,303 positions (Atub-SSU) and 1,593 positions (SSU), respectively. GTR + I + C was the best fitted model for nucleotide dataset (Atub_n74) selected by AIC as implemented in MrModeltest v2 , and Blosum62+I+G was the best one for amino acid dataset (Atub_aa) selected by AIC as implemented in ProtTest 1.4 . Maximum likelihood analyses, and 1,000 bootstrap replicates, were conducted using RaxML-HPC v7.2.7 . A Bayesian inference (BI) analysis was performed with MrBayes 3.1.2 using the GTR+I+G model selected by MrModeltest 2 under the AIC criterion. Markov chain Monte Carlo (MCMC) simulations were run with two sets of four chains using the default settings: chain length 1,500,000 generations, with trees sampled every 100 generations. The first 3,000 trees were discarded as burn-in. The remaining trees were used to generate a consensus tree and to calculate the posterior probabilities (PP) of all branches using a majority-rule consensus approach. Phylogenetic trees were visualized with TreeView v1.6.6 and MEGA 4 .Congruence of different data partitions (in this case genes) was tested with both the incongruence length difference (ILD) test and Shimodaira-Hasegawa (S-H) test as implemented in PAUP*4.0b 10. PAUP* 4.0b 10 was used to generate constraint trees, and resulting trees were compared with unconstrained ML tree using the approximately unbiased (AU) test as implemented in CONSEL package . […]

Pipeline specifications