Computational protocol: Chasing the hare - Evaluating the phylogenetic utility of a nuclear single copy gene region at and below species level within the species rich group Peperomia (Piperaceae)

Similar protocols

Protocol publication

[…] Several mostly small regions of uncertain sequence homology (hotspots) had to be excluded from the different data matrices (Additional File : Hotspots excluded, due to ambiguous homology assessments). All alignments are available from TreeBASE.Indel matrices were calculated using the "simple indel coding" approach (SIC) []. This indel matrix was generated automatically by the indel coding tool of SeqState []. Substitution models for Bayesian inference (BI) were determined using jModelTest []. For the agt1 dataset the general time reversible model of nucleotide substitution and site-specific rate categories following a gamma distribution (GTR+I+Γ) was assigned as the best fitting model considering the Akaike information criterion (AIC). For the three chloroplast datasets, GTR+Γ was the best fitting model. Bayesian MCMC inferences were performed with MrBayes v3.1 [] using the substitution models mentioned above.The BI was applied with four Markov chains running simultaneously for 4 million generations, saving trees every 100 generations. The burn-in was individually set for each analysis between 5% and 20% after determining stationarity of each run with Tracer v1.5 []. At least ten runs were assembled to generate the consensus trees and posterior probabilities for each individual analysis. Maximum Likelihood as implemented in RAxML Version 7.2.7.a [] using the rapid bootstrap algorithm was used in order to increase the number of bootstrap replicates to 1,000.The degree of homoplasy of each dataset both with and without indels was assessed on a Maximum Parsimony (MP) tree that was obtained using a parsimony ratchet approach. Command files for MP analyses were created using PRAP [] and executed in PAUP*4b10 []. Topologies were obtained with the heuristic search strategy and 10 random addition cycles of 200 iterations each with a 25% upweighting of the characters in the iterations. For compiling and drawing all trees TreeGraph2 [] was employed.Sequence statistics for specified regions of each marker were obtained utilizing SeqState []. The Shimodaira-Hasegawa (SH) test [] was performed in PAUP*4b10 [] with full optimization to evaluate the topologies obtained by the different genetic markers against each other. The SH test simultaneously compensates for a posteriori hypotheses of multiple alternative topologies by adjusting the expected difference in log-likelihood values. Topology tests were performed on the agt1 dataset containing one randomly selected single copy to comply with the congruence of sampling. In case of conflict between two markers (p ≤ 0.05) the test was repeated with a manually modified topology to determine the conflict. In the process, one hypothesis (topology) was stepwise adjusted to the phylogenetic results of the conflicting marker to evaluate the effects of single clade position changes. […]

Pipeline specifications

Software tools jModelTest, MrBayes, RAxML, TreeGraph
Databases TreeBASE
Application Phylogenetics