A collection of various models can result in much better model than each of the individual models. This is the basic idea behind ensembling . It is very popular in machine learning competitions and really helps in winning competitions in Kaggle.
Lets understand the power of a combined model with some examples.
Let say we have a test set of 10 samples and the ground truth is all positive.
1111111111
We have three machine learning classifiers which randomly predicts 80% of the data accurately, namely X,Y and Z
For a majority vote with three members we can expect 4 outcomes
1.All classifiers predict correctly
=0.8*0.8*0.8=0.512
2.Two are correct
=0.8*0.8*0.2 +
0.8*0.2*0.8 +
0.2*0.8*0.8
=0.384
3.One is correct
= 0.8*0.2*0.2 +
0.2*0.8*0.2 +
0.2*0.2*0.8
= 0.096
4.None correct
=0.2*0.2*0.2 =0.008
Hence the probability that the majority is correct =0.512+0.384=0.896
This is an almost 10 % increase in the accuracy. Practically its much more easier to train three different models individually having accuracies of 80% than to train a single model having an accuracy of 90 percent.
The key factor to be noted here is that the individual models should be least correlated with each other as possible. This will if the models are highly correlated then the above math wont hold and there wont be a substantial improvement in the accuracy of the ensemble.
Lets understand the power of a combined model with some examples.
Let say we have a test set of 10 samples and the ground truth is all positive.
1111111111
We have three machine learning classifiers which randomly predicts 80% of the data accurately, namely X,Y and Z
For a majority vote with three members we can expect 4 outcomes
1.All classifiers predict correctly
=0.8*0.8*0.8=0.512
2.Two are correct
=0.8*0.8*0.2 +
0.8*0.2*0.8 +
0.2*0.8*0.8
=0.384
3.One is correct
= 0.8*0.2*0.2 +
0.2*0.8*0.2 +
0.2*0.2*0.8
= 0.096
4.None correct
=0.2*0.2*0.2 =0.008
Hence the probability that the majority is correct =0.512+0.384=0.896
This is an almost 10 % increase in the accuracy. Practically its much more easier to train three different models individually having accuracies of 80% than to train a single model having an accuracy of 90 percent.
The key factor to be noted here is that the individual models should be least correlated with each other as possible. This will if the models are highly correlated then the above math wont hold and there wont be a substantial improvement in the accuracy of the ensemble.
No comments:
Post a Comment