Computational protocol: Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

Similar protocols

Protocol publication

[…] We used conventional regression trees, bagged regression trees, random forests, and boosted regression trees to predict the probability of 30-day mortality for patients hospitalized with cardiovascular disease. Readers are referred elsewhere for details on these tree-based methods (Clark and Pregibon, ; Freund and Schapire, ; Breiman et al., ; Friedman et al., ; Breiman, ; Hastie et al., ; McCaffrey et al., ; Buhlmann and Hathorn, ).For bagged regression trees, a regression tree was grown in each of 100 bootstrap samples. For random forests, 500 regression trees were grown. When fitting random forests of regression trees, we let the size of the set of randomly selected predictor variables used for determining each binary split to be , where p denotes the total number of predictor variables and denotes the floor function (this is the default in the R implementation of random forests). For boosted regression trees, we considered four different base regression models: regression trees of depth one through four (which have also been referred to as regression trees with interaction depths one through four). For boosted regression trees, we considered sequences of 10,000 regression trees.For all methods, we used implementations available in R statistical software (R version 2.11.1, R Foundation for Statistical Computing, Vienna, Austria). We grew conventional regression trees using the function from the rpart package (version 3.1-46). The optimal size of each regression tree was determined using cross-validation using the function. Regression trees were then pruned using the function. For bagging, random forests, and boosted regression trees, we used the function from the ipred package (version 0.8-8), the function from the randomForest package (version 4.5-36), and the function from the gbm package (version 1.6-3.1), respectively.We used two different logistic regression models to predict the probability of 30-day mortality, both of which consisted of only main effects. In the first logistic regression model, all continuous covariates were assumed to have a linear relationship with the log-odds of death. The second logistic regression model used restricted cubic smoothing splines with four knots and three degrees of freedom to model the relationship between continuous covariates and the log-odds of death (Harrell, ). For both logistic regression models, all candidate predictors were included in the regression models, and no variable reduction was used. We used the function to estimate the first logistic regression model, while we used the and functions from the Design library (version 2.3-0) to estimate the logistic regression model that incorporated restricted cubic smoothing splines.For comparative purposes, we compared the predictive performance of the above methods with previously developed disease-specific mortality prediction models. The GRACE (Global Registry of Acute Coronary Events) score was derived and validated for predicting mortality in patients hospitalized with acute coronary syndromes (Granger et al., ). The score comprises the following variables: Killip Class, systolic blood pressure, heart rate, age, and creatinine level. In the AMI sample, 30-day mortality was regressed on the GRACE score using a univariable logistic regression model (instead of entering the components of the score separately). We used the GRACE score as it has been shown in a recent systematic review to predict mortality in patients with acute coronary syndromes more accurately than other scores (D'Ascenzo et al., ). The EFFECT-HF mortality prediction model is a logistic regression model that has been derived and validated for predicting 30-day and one-year mortality in patients hospitalized with CHF (Lee et al., ). The model for predicting 30-day mortality uses the following variables: age, systolic blood pressure, respiratory rate, sodium, urea, history of stroke or transient ischemic attack, dementia, chronic obstructive pulmonary disease, cirrhosis of the liver, and cancer. In the CHF sample, 30-day mortality was regressed on the individual variables in the EFFECT-HF model. […]

Pipeline specifications

Software tools IPred, randomforest
Application Miscellaneous
Organisms Homo sapiens
Diseases Cardiovascular Diseases, Heart Failure, Myocardial Infarction