[…] in motor model at or above the 'gathering threshold' (score = -135; expectation value < 2 × 10-4). However, for phylogenetic reconstructions, highly divergent sequences cause problems with both sequence alignment and tree inference [] and we found that inclusion of the most divergent kinesin sequences hindered tree reconstruction (data not shown). For this reason, 166 sequences with scores < 100 (expectation value > 10-25), representing the most divergent sequences, were excluded from phylogenetic analyses (Additional file ). The remaining 1458 sequences were trimmed to 80 aa either side of the kinesin motor domain (as defined by the Pfam model) and the motors domains aligned using MAFFT6.24 [] adopting the E-INS-i strategy []. This alignment was then trimmed to well-aligned blocks (330 characters) and we reduced redundancy in the dataset by removing 195 sequences from duplicated genes that encode proteins predicted to be identical or nearly identical (>95% identity at the amino acid level) to other sequences from the same organism. Both untrimmed and trimmed alignments are available in Additional file and , respectively., Bayesian phylogenies were inferred from the protein alignment using metropolis-coupled Markov chain Monte Carlo (MCMCMC) method as implemented in the program MrBayes3.1.2 []. The WAG substitution matrix was used [] with a gamma-distributed variation in substitution rate approximated to 4 discrete categories and shape parameter estimated from the data (mean α = 0.927). Ten runs were preformed each consisting of 4 Markov chains heated to a 'temperature' of 0.2 and run for 12,000,000 generations. All runs were initiated from a starting tree inferred from BLASTp scores as described in [] - a strategy which gave significantly better stationary phase tree likelihoods than those using starting trees inferred by either maximum parsimony or neighbor-joining (data not shown). Chains were sampled every 8,000 generations. Two runs, which did not reach apparent stationary phase by halfway through the run, were discarded. For the remaining 8 runs, the first 6,400,000 generations of each was discarded as burn-in and the remaining generations were used to construct the majority-rule consensus tree shown in Additional file ., Since the scale of the phylogenetic analysis (1263 sequences) made bootstrap replication unfeasible, we tested the level of support […]

