Computational protocol: The root of the East African cichlid radiations

Similar protocols

Protocol publication

[…] A total of 63 specimens of 54 species were included, representing all major groups of African cichlids, focusing on Haplotilapiines sensu Schliewen & Stiassny []. To serve as nested outgroups, members of all basal African lineages (Tylochromini, Chromidotilapiini, Pelmatochromini, Hemichromini) were added. As several recent molecular and morphological studies support the basal position of Heterochromis multidens with respect to the rest of the African cichlid radiation [,,], this taxon served as outgroup. Total genomic DNA was isolated from fin clips or muscle tissue using the Qiagen Tissue Extraction Kit (DNeasy Tissue Extraction Kit) following the manufacturer's protocol. The following mitochondrial markers were amplified and sequenced: partial mitochondrial 12S and 16S genes and the intervening sequence between them, and ND2. Additionally, four nuclear protein coding genes (ENC1, Ptr, SH3PX3 and Tmo4c4) and the first intron of the ribosomal protein coding gene S7 were amplified and sequenced. The programs BioEdit (ClustalW) and MUSCLE v.3.6 were used for sequence alignment, followed by a control for ambiguous alignment positions using ALISCORE v.0.2 under default settings []. ALISCORE checks for random sequence similarity using MCMC and a sliding windows approach. Under this regime, similarity profiles based on pairwise comparisons of sequences were calculated. Ambiguous positions were summarized in a consensus profile along the alignment [] and subsequently removed from all analyses.Coding genes were translated into amino acid sequences to check for stop-codons or frame shifts and datasets were checked separately for saturation at each codon position. Base frequencies were equal for all markers (Chi-square tests, df = 183, all p > 0.9). The combined dataset of all sequenced markers resulted in a data matrix of 6176 total bp comprised of 12SrRNA: 349 bp, 16SrRNA:543 bp, 12S/16S:1245 bp, ND2: 1014 bp, ENC1: 725 bp, Ptr: 691 bp, SH3PX3: 681 bp, Tmo4c4: 425 bp and S7 (first intron): 503 bp. In addition, a second dataset of 263 ND2 sequences (900 bp) retrieved from Genbank and 38 newly sequenced ND2 sequences was generated (additional file ), resulting in high coverage over all major African cichlid tribes, some of which are not present in data set A. ND2 was chosen because this marker was available on GenBank for a representative sampling of African cichlids. The third codon position was saturated between in- and outgroups in dataset B, but because the focus of this analysis was the identification of terminal clades (younger splits), third positions were not excluded. Data were partitioned according to 1st, 2nd and 3rd codon position and all parameters were estimated separately. A ML phylogeny was constructed with RAXML v.7.0.3 using the rapid hill climbing bootstrap algorithm with 1000 replicates and following ML search. Branches not supported by 50% bootstrap values were collapsed. [...] Bayesian Inference (BI) and Maximum Likelihood (ML) approaches were used for phylogenetic inferences. The combined dataset was partitioned according to coding vs. non-coding and mitochondrial vs. nuclear genes yielding four partitions, i.e. two partitions for mitochondrial genes (rRNA and 1st and 2nd codon position of ND2) and two for nuclear genes (Exons and Intron). The third codon-position of ND2 was excluded from phylogenetic analyses (dataset A), as previous tests showed saturation between Haplotilapiine and basal taxa (data not shown). For each partition model parameters were estimated separately. For BI, best fitting models of sequence evolution were estimated using the Bayes Factor Test []. Bayesian analyses were performed using MrBayes v.3.1.2 [] with eight parallel runs each over 106 generations starting with random trees and sampling of trees every 1000 generations. To ensure convergence the first 105 generations of each run were treated as burn-in and excluded. The remaining trees from all Bayesian analyses were used to build a 50% majority rule consensus tree. The GTR + 3 model, implemented in the program RAxML v.7.0.3 [] was used for Maximum Likelihood analyses. Branch support was evaluated for the best scoring ML tree using non-parametric bootstrapping (BS) consisting of 1000 pseudoreplicates (using RAxML) and Bayesian posterior probabilities (BPP). [...] To test for unreliably placed taxa the leaf stability index [] was calculated for all taxa based on 1000 bootstrap trees using the program Phyutility v.2.2. []. This index is a good measure of the consistency of a taxon's position relative to other taxa across bootstrap replicates. Using the same program, branch attachment frequencies were calculated for clades with low support values (BS < 90) using 1000 bootstrap trees and the ML topology as well as the first 2000 BI topologies (after burn-in) and the BI topology. Following Seehausen [], we applied a tree-based method to test for excess homoplasy in our dataset, possibly introduced by taxa of ancient hybrid origin. The inclusion of a hybrid taxon would be expected to increase internal conflict in the tree and diminish support values for affected nodes owing the reticulate nature of the process []. To test for this possibility, each taxon was successively removed from the dataset (N = 63 experiments) and subsequently a likelihood run (using RAxML) under the GTR + 3 model with 1000 rapid bootstrap replicates was conducted for each resulting dataset. The resulting trees and bootstrap support values for the focus clades were checked manually. [...] Date estimates were calibrated using two age constraints. One calibration point (O) was based on the fossil record of Oreochromis lorenzoi† [] from the Early Miocene of the Baid Formation (5.98 mya, []). The second calibration point, assigned to the split between Tylochromis and the remaining African cichlids (except Heterochromis multidens), corresponds to the 95% credible interval estimates for African Cichlidae from Azuma et al. [], (exact dates were kindly provided by Y. Kumazawa and Y. Azuma, pers. comm.). Estimates based on both non-cichlid teleostean fossils (A1 53–84 mya) and Gondwana fragmentation (A2 71–89 mya) were taken. An exponential prior using a zero offset of 5.58 mya (marking the minimum age) with a mean of 1 was used for the fossil calibration point and a uniform prior with upper and lower bounds either from 53 to 84 mya (A1), 71 to 89 mya (A2) or a combination of both with 53 to 89 mya (A3) [] were fixed prior to analyses. As the distinction between Oreochromis and Sarotherodon is based on characters that are often not preserved in fossils [], at least two possible placements for the Oreochromis lorenzoi† fossil in the phylogenetic framework exist. Most conservative is a placement at the base of all mouthbrooding Tilapiines (O1) or, less so is a placement at the base of the genus Oreochromis (point O2). Oreochromis lorenzoi† [] is in our point of view one of the few reliable cichlid fossils suitable for calibration, as the type specimens are in a well preserved state and all key traits necessary for identification are recognizable. Its phylogenetic placement within the African cichlid phylogeny is less ambiguous than for other fossils, as the Oreochromini are a clearly monophyletic group (Figure ). Unfortunately this is not the case for most other African cichlid fossils, which often lack diagnostic characters necessary for a precise assignment to cichlid tribes (for a more detailed discussion see additional file ). Divergence time analyses were conducted using a log-normal distributed relaxed molecular clock MCMC approach [] as implemented in BEAST v.1.4.8 []. For all calculations data were partitioned as described earlier and the BI topology was used as starting tree. Separate substitution models were used for each partition based on the results of the Bayes Factor test. A pure birth model (Yule) was assigned as prior for the branching process and two independent and identical runs were conducted for each BEAST setup for 306 generations. Convergence of parameters was checked using Tracer v.1.4 []. The first 10% of generations were discarded as burn-in and the effective sample size (ESS) was checked for good mixing of the MCMC. All exceeded 200 for all model parameters. Divergence dates were also estimated using penalized likelihood [] as implemented in the program r8s v.7.1 []. The optimal smoothing parameter was 63 for each run determined by a cross-validation approach []. All runs were conducted several times with different sets of constraints to evaluate the influence of different calibration points. As expected inclusion of the fossil calibration point produced slightly younger but also narrower confidence intervals for all ages (additional file and Table ). Two alternative placements of Oreochromis lorenzoi† within the topology resulted in slightly different age estimates, with younger ages when the calibration point was set at the root of all Oreochromini. Using the penalized likelihood approach no difference in age estimates was observed for different placements of Oreochromis lorenzoi†. Overall, age estimates largely overlap independent of the priors used (additional file ). […]

Pipeline specifications

Software tools BioEdit, Clustal W, Aliscore, RAxML, MrBayes, phyutility, BEAST, r8s
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Oreochromis niloticus