Computational protocol: A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits

Similar protocols

Protocol publication

[…] Archaeopteryx is aware of the semantic issue, see (). It offers an option to define the semantics of annotated values. The default is to interpret nodes labels as node labels, thus the rerooted tree is correctly displayed only for the node interpretation. When activating the option “Internal Node Names are Confidence Values”, rerooting algorithms correctly shift support values to the corresponding branches. Prior to v. 0.9911, there was a minor issue in displaying these values on screen. This was fixed after we contacted the developers. Archaeopteryx does not support the comment notation (e.g., tree TC). ATV is the predecessor to Archaeopteryx. Different versions seem to alternate between the two possible interpretations of inner node labels. The one we tested uses the branch interpretation of node labels and thus correctly reroots. Dendroscope versions prior to v. 3.5.0 only offered the node labels as node labels interpretation for our test trees. This led to incorrect results when rerooting trees with node labels that actually represented branch support values. Only if the tree also contains branch lengths, Dendroscope interpreted the Newick comments as support values (e.g., tree TC plus branch lengths). The alternative notation using inner node labels (e.g., tree TN) is not affected by this and always applies the node label interpretation. This behavior was not fully documented in the manual. We assess the impact of this behavior on published empirical phylogenetic studies in section “Impact on Empirical Phylogenetic Studies”. In the latest versions of Dendroscope (v. 3.5.0 up to v. 3.5.4), all of our recommendations (see section “Conclusions”) made in the first bioRxiv preprint () of this review were implemented by Daniel Huson. When reading a Newick file with node labels, Dendroscope now explicitly asks the user for the intended interpretation. It also has a menu option to choose between the interpretations. ETE (GUI) (, ) is another viewer that supports both interpretations. When reading a Newick formatted tree, it offers an option for specifying label semantics. The comment notation is not supported (e.g., tree TC). EvolView is able to display numerical values at inner nodes. Rerooting however misplaces those values to wrong nodes and sets some of them to zero. Rerooting a given tree several times at different branches results in all inner node values becoming zero. Furthermore, rerooting does not resolve the initial trifurcation properly, so that the resulting tree contains a multifurcation at node R. The developers are aware of these issues and intend to fix them in a future release. FigTree is able to display multiple inner node labels using both semantic interpretations. When rerooting the tree, however, there is no option to define the interpretation of the node labels, that is, FigTree internally always assumes the branch interpretation. Thus, after rerooting actual node labels, the labels are mapped to wrong nodes. In addition, it cannot parse certain Newick variants, such as trees that contain both branch lengths and support values stored as comments. iTOL (, ) works correctly. If the inner values are numbers, it implicitly applies the branch support values interpretation. If they are strings, they are interpreted as inner node names. In both cases, re-rooting works as expected. However, it does not offer an explicit option to change this behavior, that is, to interpret numbers as belonging to the nodes, or strings as belonging to the branches. PhyloWidget interprets node labels as node labels. Thus, rerooting a tree with branch support values yields errors. Also, rerooting does not resolve the initial trifurcation, similar to EvolView. Phylowidget is no longer maintained, thus its authors recommend not to use it for rerooting phylogenies or displaying branch support values. Therefore, it is marked as not correct in . TreeView interprets node labels as branch support values and correctly reroots under this interpretation. However, it displays the values next to the nodes instead of the branches, which may lead to potential confusion. T-REX also applies the branch interpretation and correctly reroots. The branch support values are however always displayed as percentages, that is, followed by a “%” sign. This is not always the correct or desired way for displaying branch support values. The developers plan to fix this in the next release. Hence, we marked it as almost correct in . T-REX does not work with the comment notation. [...] APE interprets inner node labels as node attributes when rerooting. We reported this issue to the project maintainers and a new version of the package (v. 3.6) is now available that allows handling node labels as support values when rooting. In addition, a workaround solution is provided in the , of this manuscript that patches previous APE versions. BioPerl offers options to explicitly load node labels as branch or node attributes. When the branch interpretation is selected, rerooting algorithms work correctly. BioPython, with the BioPhylo module for handling trees (), interprets inner node labels as confidence values when parsing a Newick tree. However, those values are handled as node attributes rather than as branch attributes when rerooting the tree, therefore leading to incorrect positions of the support values. The same behavior is observed when explicitly loading support values using the PhyloXML format. This is currently a known issue in the project and a fix is being developed. Dendropy loads inner node labels as node attributes. Therefore, if those labels are meant to represent support values, rerooting will lead to incorrect results. The Dendropy documentation explains this behavior in detail, and a workaround is available that permits to reroot trees where bootstrap values are encoded as node labels in the Newick format. A new option has been added in version 4.2 that allows to automatically translate node labels into branch support values when loading a Newick tree, so rerooting algorithms can be safely applied without further tree processing. ETE (API) (, ) offers the same options as when used for tree visualization (see above). Node labels can be loaded as node names (node attributes) or branch support values (branch attributes). When rerooting, branch support values will be correctly remapped to branches. Geneious is able to read both Newick notations, and by default interprets the values as node labels. The branch interpretation is available as an undocumented feature, depending on the naming of those values. However, when rerooting the tree, the values are treated as belonging to the branches in both cases. This results in misplaced node labels. The maintainers are planning to fix this and to make the interpretation choice more apparent. MEGA (, , ; ) is able to read both notations, and interprets the values as branch support values in both cases. Rerooting works correctly under this interpretation. Mesquite understands the node label notation, but not the comment notation. By default, it interprets node labels as node labels and correctly reroots. There is also a function to reinterpret internal node labels and turn them into branch values; rerooting works correctly after this transformation. For a future release, the maintainers plan to implement a user prompt for choosing the interpretation when a tree with inner node labels is loaded. Newick Utilities does not handle node labels as branch attributes by default, therefore leading to incorrect results when rerooting Newick trees. After reporting the issue, a previously undocumented option (–s) has been documented that permits to automatically interpret inner node labels as branch support attributes. Pycogent interprets inner node labels as support values by default and those are correctly handled by the rooting functions. [...] Users, who are not aware of the implicit semantic assumptions of tree manipulation tools, might obtain tree visualizations with incorrectly mapped support values. This is particularly the case if the node interpretation is wrongly applied to branch support values. Most prominently, older versions of Dendroscope (before version 3.5.0, see section “Results”) implicitly interpret node labels as, simply that, node labels. The extent to which this affects published phylogenies is hard to quantify. This is because all visualized phylogenies in all published papers citing Dendroscope (over 1,200 for the two Dendroscope papers based on Google scholar, accessed on August 15, 2016) would need to be checked and all original tree files would need to be available, which they should be, in principle. Hence, this is also an issue of reproducibility of scientific results—even if in our case it simply boils down to making available a published Newick tree with support values for download. To at least get a feeling of the visualization and reproducibility issue, we contacted the authors of 14 papers that used Dendroscope to visualize trees with support values, published in journals such as Nature, PLOS, BMC, and JBC. Out of the contacted authors, five replied, but only two were finally able to provide us with the trees that were used to generate the visualizations in their publications.In the following, we analyze the trees visualized for these two papers with respect to the correctness of the branch support value mapping.The first article () presents a phylogeny of 80 Arabidopsis accessions (see fig. 4b of ) along with bootstrap values for some of the branches. The tree and bootstrap values were inferred with RAxML 7.3.5 (), which writes a tree file that uses Newick comments for storing support values. Dendroscope () was used to reroot and visualize the tree. As already mentioned, the tool is able to correctly handle this variant of stored support values. Thus, the error did not occur in this paper and the tree is correctly visualized.The second article () presents several phylogenies for all three domains of life. The trees were inferred using RAxML v7.2.6 (, ; ) and PHYML v3.0 (; ; ). Branch support values were estimated with PHYML using the SH-like likelihood ratio test (), which reports support values as node labels. All trees in figures 2 and 4–7 of were rerooted using Dendroscope such that they can be more easily compared with the comprehensive trees presented in figure 1 of the article. In all cases, branch support values were mapped incorrectly to the rerooted trees in these figures. Fig. 2We illustrate this in . is the original Newick tree used to generate figure 2a in . We have marked the branch used for (re)rooting the tree by a red cross. We colored the subtrees so that their corresponding position in the rerooted tree is easily visible. shows the rerooted tree using Dendroscope v. 3.4.0, which is identical to the one presented in . The branch support values between the old and the new root node in our are not mapped to the same bipartition in . For example, in the support value underlined in green refers to the bipartition green taxa—blue taxon, red taxa whereas in it refers to the bipartition red taxa—green taxa, blue taxon. Fortunately, in this specific case, the incorrectly mapped support values do not change the conclusions of the paper (pers. comm. with Daniel Lundin on December 28, 2015). In , we show the correctly rerooted tree, created with the updated Dendroscope version 3.5.3. The value underlined in green now refers to the correct bipartition. Furthermore, the value underlined in red is correctly duplicated at both outgoing branches of the root. […]

Pipeline specifications