Computational protocol: A gradient-boosted model analysis of the impact of body mass index on theshort-term outcomes of critically ill medical patients

Similar protocols

Protocol publication

[…] Continuous variables were assessed for normality between the survivors and non-survivors using the Kolmogorov-Smirnof test. Parametric variables were compared between the groups using t-tests. Non-parametric variables were compared using the Mann-Whitney test. Categorical variables were compared using a Chi-squared test. This analysis used a gradient-boosted model (GBM) to evaluate the influence of illness severity, burden of comorbidities, body mass index and performance status on hospital mortality.() GBM is thus an additive regression model in which the terms are decision trees that are obtained after simple recursive partitioning. After the first decision tree is built based on the data, another tree is fitted for the residuals of the first tree. This process continues until pre-specified boundaries are reached, producing hundreds or thousands of trees that are then included via a boosting algorithm that eventually produces the final model. GBM has several advantages over traditional logistic regression, such as having no need for prior variable transformations, insensitivity to the effects of outliers, the ability to fit non-linear relationships and the capability of handling missing data.() GBM is frequently used to describe findings and patterns() but is seldom used in critical care medicine. In some scenarios, the use of GBM provided additional data that were not obtained by other methods.() The GBM model considered age, SAPS3adj, PS, CCI, previous steroid use, LOS before ICU admission, temperature at admission, diagnosis of sepsis at admission and BMI. GBM settings were set to obtain at least one thousand trees.() The initial settings included a bag fraction of 0.5, a tree complexity of 8, and a learning rate of 0.001. Ten-fold cross validation was used. We report the relative influence of each variable on the model using the method suggested by Friedman,() where the relative influence of each variable on the outcome is scaled from 0 to 100 with the relative influence of each variable being proportional to the number of times the variable was used in node splitting and weighted by the squared improvement to the model that results from the split and the average for all trees.(,,) The influence of each variable on the outcome is shown in partial dependence plots. The strength of the second-degree interactions was assessed by the mean value of the residuals of a linear model that relates the predictions of each predictor pair with the predictors fitted by the factors.() When necessary, interaction plots are shown.The association between the LOS in the hospital after ICU admission in hospital survivors was assessed with multiple linear regression analyses. The same variables included in the mortality model were used in this analysis, and interactions were allowed; however, no stepwise analysis was performed.All analyses were performed using R project v 3.0.2 (www.r-project.org) with the gbm, dismo and ggplot2 packages. We used the documentation of the dismo package and a previous review on the subject as a guide to these analyses.(,) […]

Pipeline specifications

Software tools dismo, Ggplot2
Applications Miscellaneous, Phylogenetics
Organisms Homo sapiens