Computational protocol: Molecular phylogeny and divergence times of Malagasy tenrecs: Influence of data partitioning and taxon sampling on dating analyses

[…] Sequences were assembled and aligned with the ED editor of the MUST package [], and manually adjusted taking amino acid properties in consideration. Amino acid repeats and sites not sequenced or gapped in more than 25% of the taxa were excluded from analysis. This resulted in a dataset of 1,101 bp for ADRA2B, 1,161 bp for AR, 852 bp for GHR, and 1,173 bp for vWF. The full data matrix is available from Treebase (accession number: M3679). Phylogenetic reconstructions on each gene separately and on the concatenated dataset were performed by maximum likelihood (ML) with PAUP*, version 4b10 [], and by Bayesian analyses with MRBAYES, version 3.1.2 []. The best fitting model under the ML criterion was selected from the "Akaike Criterion" output of MODELTEST, version 3.7 []. The ML analysis was conducted using a loop approach to estimate the best tree and the optimal likelihood parameters. With this approach parameters and best tree are re-estimated until they reach stability. Node stability was estimated by 100 non-parametric bootstrap replicates []. A major advantage of Bayesian phylogenetic inference is the possibility of partitioning the data, giving each partition its own best fitting model of sequence evolution. However, overpartitioning may introduce unnecessary sampling variances which could influence the phylogenetic estimates. For the twelve possible codon partitions (each codon position of each gene) MODELTEST was used to calculate the best fitting model of sequence evolution. As further explained in Table , codon partitions with similar models and model parameters were merged, resulting in nine partitions for the Bayesian analyses. Two runs of four Markov chains were calculated simultaneously for 1,000,000 generations with initial equal probabilities for all trees and starting with a random tree. Tree sampling frequency was each 20 generations, and the consensus tree with posterior probabilities was calculated after removal of the first 25% of the total number of trees generated, corresponding to 12,500 trees. The average standard deviation of split frequencies between the two independent runs was lower than 0.01.To assess the stability of the phylogenetic position of Geogale aurita, our result was compared, according to both Kishino and Hasegawa [] and Shimodaira and Hasegawa [] (using RELL bootstrap as well as full optimization methods), to the hypotheses of Olson and Goodman [] and Asher and Hofreiter []. Furthermore, Ka (i.e., number of nonsynonymous substitutions per nonsynonymous site) and Ks (i.e., number of synonymous substitutions per synonymous site) of pairwise tenrec sequences were calculated using the program CODEML from the PAML package [] in order to assess the molecular divergence between the two Geogale GHR sequences and compare it with the level of molecular divergence displayed within the Malagasy tenrec clade. […]

