Computational protocol: A New Support Measure to Quantify the Impact of Local Optima in Phylogenetic Analyses

Similar protocols

Protocol publication

[…] We define the p-support of a clade as the percentage of p peaks with majority support for that clade. A p-support value of 100% means that a clade was supported by each peak whereas 0% implies that the clade was not strongly supported by any of the p peaks. p-support can be viewed as a measure of precision at the peak level much the same way that bootstrap and jackknife support are measures of precision at the character and taxa level. High p-support values signal that a clade is in high agreement across the peaks and therefore is less likely to be overturned by additional analysis. Similarly to other common support measures, p-support can be useful in identifying the areas of a tree that may benefit the most from additional data and analysis. In this way, support measures are a useful tool in illuminating new problems and hypotheses.The most critical feature to the p-support measure is the identification of the p peaks which are the input to the p-support calculation. We have developed the PeakMapper algorithm to determine how many distinct sets of trees are contained in a data set as well as which trees are contained in peak. While our technique uses clustering to identify the peaks among the trees, p-support is independent of our PeakMapper algorithm. Any method that identifies distinct sets of trees can be used with our p-support measure. For instance, if tree islands were detected and labeled in a data set that information could be used to compute p-support. Our PeakMapper software identifies peaks in a tree collection and annotates majority and strict consensus trees with p-support values that can be viewed in standard tree viewing packages such as FigTree. Furthermore, our PeakMapper software is designed for analyzing large-scale tree collections (eg, tens of thousands of trees). [...] Using our PeakMapper approach, we analyze two published Bayesian studies on 150 taxa of desert algae and green plants and 567 taxa of angiosperms data sets. The 150 taxa data set consists of 20,000 trees from two runs of the MrBayes phylogenetic heuristic. The 567 taxa data set contained 33,306 trees from 12 Bayesian runs. Both of these tree collections have high majority consensus resolution rates. Our approach shows that both tree sets contain multiple peaks—there are two and six peaks found for the 150 and 567 taxa data sets, respectively. Hence, high consensus resolution rates do not exclude the possibility of a tree set containing multiple peaks. These data sets present two interesting cases: the number of trees in the peak in the 150 taxa data set are of equal size while they are disproportional in the 567 taxa data set. These cases show how the distribution of trees across peaks can impact the resulting majority consensus tree and also show how p-support can provide previously unavailable information about the distribution of the clades. We show that the 150 taxa data set contains three clades that appear in the majority consensus tree but are only supported by one of the two peaks. The 567 taxa data set contains seven clades in the majority consensus tree with supported by only three of the six peaks and a clade in the majority consensus tree supported by only two of the six peaks.Overall, our work presents systematists with a new measure called p-support for quantifying the robustness of inferred relationships in an evolutionary tree. We hope that p-support can provide researchers and the community at large with more information about the results of phylogenetic analyses—especially in regards to which regions of the tree may benefit most from further investigation. […]

Pipeline specifications

Software tools FigTree, MrBayes
Application Phylogenetics