21 Jun Classification Methods- Episode 5: Bagging, Random Forests and Boosting
1:00 pm - 2:30 pm
statistics seminar | level: intermediate |
for questions related to this event, contact firstname.lastname@example.org
affiliation: Ghent University
“Unity is strength” & “Wisdom of the crowd” are all sayings that express the underlying idea that rules the very powerful “ensemble methods”. Ensemble methods train multiple models (often called “weak learners”) that are used to solve the same problem and combine these models to get more accurate and/or robust results. In this seminar, you are going to be introduced to popular ensemble methods such as: Bagging, Random Forests and Boosting. We are going to use the Bankloan data set to illustrate these methods in R.
Bagging is short form for Bootstrap Aggregation. In Bagging, trees are fitted to bootstrap replicates of the training data set. These fitted trees are then combine with the aggregated response being a plurality vote when predicting a categorical outcome (classification). In R, bagging is available in the ipred package.
The Random Forest technique is Bagging plus a “perturbation” of the algorithm. The specific form of the perturbation is called “subset splitting.” In decision tree models, for each variable, every possible split point is evaluated. In Random Forest, only a random subset of the variables is considered at each node. In R, Random Forest for classification is available in the randomForest package (randomForest).
Boosting attempts to take many “weak learners” and form a powerful “committee.” Two main algorithms will be discussed: AdaBoost and gradient boost and their implementations n R will be done using the xgboost and gbm packages respectively.
Alboukadel Kassambara. Machine Learning Essentials : Practical Guide in R. STHDA, 2018
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning : with Applications in R. New York :Springer, 2013
Machine Learning with R. By Brett Lantz
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001)