Computational protocol: Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations

Similar protocols

Protocol publication

[…] Our model is implemented using the STAN probabilistic language for statistical modeling []. In particular we use the RStan R package [] which provides various inference algorithms. In this article we used Automatic Differentiation Variational Inference (ADVI) []. ADVI is a “black-box” variational inference program, much faster than automatic inference approaches based on Markov Chain Monte Carlo (MCMC) algorithms. Even though the solutions to variational inference optimization problems are only approximations to the posterior, the algorithm is fast and effective for our applications.Our model requires a choice of a few hyperparameters γ τ,γ b,γ ρ and γ ε, which are scale parameters of the half-Cauchy distribution. The half-Cauchy distribution is recommended by Gelman et al. [, ] as a weakly informative prior for scale parameters, and a default prior for routine applied use in regression models. It has a broad peak at zero and allows for occasional large coefficients while still performing a reasonable amount of shrinkage for coefficients near the central value []. The scale hyperparameters were set at 2.5, as we do not expect very large deviations from the mean values. The value 2.5 is also recommended by Gelman in [], and is a default choice for positive scale parameters in many models described in the RStan software manual []. [...] We developed visual tools for inferring and studying patterns related to the natural ordering in the data. Our visualizations uncover hidden trajectories with corresponding uncertainties. They also show how sampling density varies along a latent curve, i.e. how well a dataset covers different regions of an underlying gradient. We implemented a multi-view design with a set of visual components: 1) a plot of latent τ against its ranking, 2) a plot of τ against a sample covariate, 3) a heatmap of reordered data, 4) a 2D and a 3D posterior trajectory plot, 5) a data density plot, 7) a datapoint location confidence contour plot, 8) a feature curves plot. The settings panel and the visualization interface are depicted in Figs. and ). Fig. 2 Fig. 3 For our visualizations we chose a recently developed viridis color map, designed analytically to “perfectly perceptually-uniform” as well as color-blind friendly []. This color palette is effective for heatmaps and other visualizations and has now been implemented as a default choice in many visualization packages such as plotly [] or heatmaply []. [...] Often it is also useful to visualize a data trajectory in 2 or 3D. We use dimensional reduction methods such as principal coordinate analysis (PCoA) and t-distributed stochastic neighbor embedding (t-SNE) [] on computed dissimilarities to display low-dimensional representations of the data. PCoA is a linear projection method, while t-SNE is a non-convex and non-linear technique.After plotting datapoints in the reduced two or three dimensional space, we superimpose the estimated trajectories, i.e. we add paths which connect observations according to the ordering specified by posterior samples of τ. We usually show 50 posterior trajectories (see blue lines in Fig. ), and one highlighted trajectory that corresponds to the posterior mode- τ estimate. To avoid a crowded view, the mode-trajectory is shown as a path connecting only a subset of points evenly distributed along the gradient, i.e. corresponding to τ i’s evenly spaced in [0,1]. We also include a 3D plot of the trajectory as the first two principal axes sometimes do not explain all the variability of interest. The third axis might provide additional information. The 3D plot provides an interactive view of the ‘mode-trajectory’; it allows the user to rotate the graph to obtain an optimal view of the estimated path. The rotation feature also facilitates generating 2D plots with any combination of two out of three principal components (0PC1, PC2, PC3), which is an efficient alternative to including three separate plots. Fig. 7 […]

Pipeline specifications