Computational protocol: A congruent phylogenomic signal places eukaryotes within the Archaea

[…] The rRNA and 29BAE protein alignments were based on those of Foster et al. [], but updated with the relevant sequences from Naegleria gruberi, Korarchaeum cryptophilum, Caldiarchaeum subterraneum and Nitrosoarchaeum limnia. Sequences were aligned with Meta-Coffee, and poorly aligning regions were identified are removed using Gblocks or TrimAl, as described in the main text. To prepare the 64AE alignments, we performed clustering of the proteomes with the Markov Cluster algorithm of the selected taxa, and built maximum likelihood trees from the initial clusters to identify single-copy orthologues which were used for phylogenetic analysis. Further details of the sequence selection and alignment protocol are provided in the electronic supplementary material. [...] We used Concaterpillar v. 1.5 [] and Conclustador v. 0.1a [] to test whether our single-gene alignments were congruent before concatenating them for phylogenetic analysis. In cases where these two methods disagreed, we built phylogenies for all of the congruent sets inferred by both methods and compared the results. To complement and expand upon these approaches, we developed a novel method for analysing the level of incongruence in a set of genes and for comparing incongruence between sets of trees inferred under different models using geodesic distances; this method is described in detail in the electronic supplementary material. [...] Best-fitting substitution models were chosen for the rRNA alignments using jModelTest []. For the protein alignments, single-matrix substitution models were chosen using the ProteinModelSelection script available from the RAxML website ( Maximum likelihood calculations were performed with RAxML v. 7.2.8 []. Bayesian Markov Chain Monte Carlo analyses were performed using the p4 (; Foster []) and PhyloBayes v. 3.3 [] packages, which implement the range of more complex models used in our analyses. Convergence was assessed by comparing the results from independent runs, and model fit in the Bayesian analyses was evaluated using posterior predictive simulations []. Further details of the models and simulations used are provided in the electronic supplementary material. […]

