Computational protocol: CenH3 evolution reflects meiotic symmetry as predicted by the centromere drive model

Similar protocols

Protocol publication

[…] We obtained all 191 CenH3 sequences from GenBank and Joint Genome Institute databases using BLAST searches. The sources and accession numbers for all sequences are supplied in .Because the quality of the alignment is absolutely crucial for selection analyses, we used BAli-Phy software, which accounts for alignment uncertainties and avoids problems with biasing alignments towards guide trees because alignments and phylogenetic trees are estimated simultaneously. The codon alignments and phylogenetic trees were jointly inferred in BAli-Phy v2.3.5 using M0 substitution model and RS07 model for indels. Because ciliate nuclear code is not implemented in BAli-Phy v2.3.5, Tetrahymena sequences were aligned at amino acid level using LG substitution model and RS07 model for indels and then backtranslated to nucleotide sequences. For each of the seventeen clades, we ran ten independent chains until they converged and then pooled the results. We masked each codon with a reliability score below 80% as “NNN” prior to selection analyses, and we used maximum a posteriori tree for all selection analyses, except primates, where a known species tree was used. All BAli-Phy alignments of CenH3 before and after masking the unreliable residues are supplied in . When only a partial sequence of CenH3 was available, we treated the missing part of the gene as missing data, and the gaps were replaced with “?”. The alignments and phylogenetic trees that we used for selection analyses are supplied in .Once we had alignments and phylogenetic trees of the CenH3 sequences for each of the seventeen clades (), we employed codon substitution models to infer the selective pressures acting on a protein from the non-synonymous/synonymous substitution rate ratio (dN/dS = ω). Non-synonymous substitutions in a codon lead to amino acid changes, while synonymous substitutions do not. If there is no selective pressure (neutral evolution), non-synonymous and synonymous substitutions are expected to occur at the same rate, with ω = 1. Purifying selection, which keeps the protein as it is, is indicated by ω < 1, and positive selection favoring substitutions that change the amino acids in a protein is indicated by ω > 1. If purifying selection is relaxed, ω tends to be elevated towards 1. Likewise, if positive selection is relaxed, ω tends to decrease towards 1.We inferred three positive selection measures for each of the 16 analyzed clades: (i) the proportion of positively selected branches in the tree, (ii) the proportion of positively selected codon in the alignments and (iii) the overall ω ratio. To determine the frequency of positively selected branches, we ran the data through the branch-site random effects likelihood (BS-REL) model of codon substitution. BS-REL allows ω to vary across both codons and branches and infers selective regimes independently for each branch of a given phylogeny, pooling information across all codons. To assess the frequency of positively selected codons, we analyzed CenH3 from each of the seventeen clades using the mixed effects model of evolution (MEME) model of codon substitution that is capable of identifying instances of positive selection at the level of individual codons. Both BS-REL and MEME analyses were performed using BS-REL and MEME modules as implemented on the DataMonkey web server. Finally, we examined the overall ω ratio to evaluate CenH3 evolution across all codons and branches for each clade using a one-ratio (M0) model with codon frequencies option set to F3 × 4 in the codeml module of PAML4.7 .To assess whether asymmetrics and symmetrics significantly differ in the frequency of positively selected branches and codons and the overall ω ratio, we employed Mann-Whitney U test. However, it is possible that the phylogenetic relationships between analyzed clades may violate statistical independence of analyzed values required for Mann-Whitney U test. Therefore, we have also assessed the differences between asymmetrics and symmetrics using phylogenetically corrected statistical analyses. For that purpose, we have constructed a dated phylogenetic tree of all seventeen clades using divergence times from TIMETREE (;). Statistical significances of phylogenetically corrected differences between asymmetrics and symmetrics were inferred using phylogenetic generalized linear models (pgls) as implemented in caper R-package (pgls function). Pgls analyses were performed in R 3.2.3 . The R-script used for pgls analyses is supplied in . […]

Pipeline specifications