## Similar protocols

## Protocol publication

[…] This section is devoted to describe the methodological details of our proposed methods. Here, we first organize our new methods separated from existing methods in Fig. . That is, **MiRKAT-S** [] for the individual use of different distance metrics is an existing method (blue letters, Fig. ), and our methods (red letters, Fig. ) are its adaptive test, **OMiRKAT-S**, MiSALN and its adaptive test, OMiSALN, and OMiSA. Again, OMiSA is our major proposed method, and the other individual and sub-adaptive tests are necessary to reach our final destination, OMiSA.Fig. 1 [...] The key idea behind MiRKAT-S [] is that diverse distance metrics (e.g., **UniFrac** distance [–] and Bray-Curtis dissimilarity []) can be incorporated into the kernel machine Cox proportional hazards model. Hence, we can survey the relationship between ecologically related metrics and survival outcomes on health or disease with covariate adjustments (e.g., age and sex) []. First, we need to specify a sample-by-sample pairwise distance matrix based on a chosen distance metric and transform it into a kernel (similarity) matrix using the kernel formula, Eq. .9K=−12(I−11'n)D2(I−11'n),where D is the n × n pairwise distance matrix and D2 is its element-wise square, I is the n × n identity matrix, and 1 in 11′ is the vector of n ones. To ensure the kernel matrix, K, to be positive semi-definite, negative eigenvalues are replaced with zero []. Then, using the resulting kernel matrix, the variance-component score statistic can be formulated with Eq. [, ].10UMiRKAT-S(k)=(d−Λ^)TK(k)(d−Λ^),where k is an index for a particular kernel matrix based on a chosen distance metric. Plantinga et al. [] has also proposed a modified score statistic which accounts for over-dispersion, but since we calculate p-values based on a residual permutation-based method and the dispersion parameter, 1d−Λ^Td−Λ^, is evaluated under the null, the variance-component score test statistic of Eq. is sufficient in our study.Importantly, different distance metrics reflect different relative contributions from microbial abundance and phylogenetic tree information; as such, the performance of MiRKAT-S differs according to the choice of distance metric and the true underlying association pattern [–, ]. The UniFrac distances are constructed based on phylogenetic tree information and the contribution of microbial abundance is modulated by different weighting schemes. The unweighted UniFrac distance incorporates only microbial presence/absence information so that it is most inclined to phylogenetic tree information [], whereas the weighted UniFrac distance further incorporates microbial abundances []. In this context, the generalized UniFrac distance is regarded as a compromised version between the unweighted and weighted UniFrac distances []. In contrast, the Bray-Curtis dissimilarity [] does not incorporate any phylogenetic tree information so that it is most inclined to microbial abundance information. Accordingly, when associated OTUs are phylogenetically related, the UniFrac distances can be better choices than Bray-Curtis dissimilarity, but vice versa when they are not phylogenetically related. However, we cannot predict which distance metric is optimal to our study. Therefore, here, we proposed a data-driven approach, namely, Optimal MiRKAT-S (OMiRKAT-S), which is taken adaptively through multiple distance metric specifications and its test statistic is formulated with Eq. .11QOMiRKAT−S=mink∈ψPMiRKAT−S(k),where Ψ is a set of candidate distance metrics and PMiRKAT − S(k) is the estimated p-value for UMiRKAT − S(k), where kϵΨ. Note that, OMiRKAT-S is similar to Optimal MiRKAT [], but the difference is that OMiRKAT-S handles survival outcomes, while Optimal MiRKAT handles binary or continuous outcomes at a time point. Here again, QOMiRKAT − S is the minimum p-value among different MiRKAT-S(k) tests, where kϵΨ, and it is a test statistic that requires its own p-value estimation. Similar to MiSALN(γ)/OMiSALN, a residual-based permutation method was used to estimate p-values for individual MiRKAT-S(k) tests, where kϵΨ, and OMiRKAT-S (see Additional file ).For a set of candidate distance metrics, Ψ, we used Ψ = {unweighted UniFrac (KU), generalized UniFrac(0.5) (K0.5), weighted UniFrac (KW), Bray-Curtis (KBC)}, where K0.5 is the generalized UniFrac distance with the parameter, ϴ = 0.5, as suggested []. [...] As for visual representations of discoveries, we used an existing software tool, **GraPhlAn** [], which is addressed later in our real data analysis. As GraPhlAn is flexibly customizable with beautiful circular representations of hierarchical taxonomic tree [], here, we do not introduce any new graphical representation and suggest to use GraPhlAn after obtaining outcomes from OMiSA. […]

## Pipeline specifications

Software tools | MiRKAT-S, OMiSA, UniFrac, GraPhlAn |
---|---|

Applications | Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis |

Organisms | Homo sapiens |