AdaBoost implementation and tuning for high dimensional feature space in R
I am trying to implement the AdaBoost.M1 algorithm (trees as base-learners) to a data set with a large feature space (~ 20.000 features) and ~ 100 samples in R. There exists a variety of different packages for this purpose; AdaBag, Ada and gbm. gbm() (from the gbm-package) appears to be my only available option, as stack.overflow is a problem in the others, and though it works, it is very time-consuming.
Questions:
- Is there any way to overcome the
stack.overflowthe problem in the other packages, or have thegbm()run faster? I have tried converting thedata.frameinto a matrix without success. - When performing
AdaBoostingbm()(with distribution set to AdaBoost), An Introduction to Statistical Learning (Hastie et al.) mentions the following parameters needed for tuning:
Needs:
- The total number of trees to fit.
- The shrinkage parameter denoted lambda.
- The number of splits in each tree, controlling the complexity of the boosted ensemble.
As the algorithm is very time consuming to perform in R, I need to find literature on what tuning parameters are within a suitable range for this kind of large feature space data, before performing cross-validation over the range to estimate the test error rate.
Any suggestions?
Topic adaboost boosting gbm r machine-learning
Category Data Science