Data Quest

A Data Science Blog.

Random Forest : And How Its Better Than Bagging

| Comments

Randomforests And Bootstrap Aggregating

As we know in simple averaging methods RFs perform better than Bootstrap Aggregating or bagging by considerably reducing the high variance (see bias-variance tradeoff) of a single classifier by continuous independent sampling with replacement from the same distribution(dataset). This performance gain of RFs is described very well in this Leo Breiman(2001) paper and also in The Elements of Statistical Learning : Hastie,Tibshirani,Friedman in the section RandomForests.

I will briefly describe the idea of performance gain in Rfs over bagging here.Since in both the ensembling methods a single classifier(decision trees) ,which is very prone to high variance, when makes use of all the available features,it tends to over-fit since a more complex tree dependence structure is generated. Averaging over different classifiers, with low collinearity among them, decreases the over-all complexity of the final ensembled model. But including all the feature is not a good idea,since all the features might not help much in overall information gain for individual classifier and many a time can prove to be a source of noise.To overcome this problem L. Breiman proposed the idea of selecting features randomly from some fixed number of features..

AdaBoost : Why It Is Robust to Overfitting

| Comments

Introduction

Bagging vs. Boosting

There are two reasons for errors in an estimator namely variance or bias.A too complex model(unpruned decision trees) have high variance but low bias whereas a too simple model( Weak learners like decision stumps ) have high bias but low variance.To minimize these two types of errors two different approaches are required namely bagging(complex models/high variance) and boosting(simple models/high bias).
In more general terms bagging also known as bootstrap aggregating tries to build complex models over many bootstrapped samples from the same distribution(datasets) and gives simpler but more robust model either by simple averaging or majority voting over former complex models.Whereas boosting is an approach of ensembling strong learners from weak learners simply by improving over the misclassified labels from dataset in a finite number of iterations.In this blog we will try to shed some light over the learning approach of boosting meta-algorithm .

FrugalKmeans: Restaurants' Home Delivery Route Optimizer

| Comments

Introduction

What if I could reduce the home delivery cost of restaurants around my city.Maybe optimize routes , cluster restaurants into similar regions and assign delivery men to each cluster. Would this help ? Lets find out.

FrugalKmeans tries to analyze this over my favorite city Lucknow,India which is known for its exquisite Awadhi Cuisine .

After-all FRUGALITY IS OPTIMAL.