Wednesday, 21 June 2017

A comaritive Study of Some Supervised Learning Models


Gaussian Naive Bayes
Strengths
  • Less amout of data is required compared to other descriminative models like Logistic Regression
Weakness
  • Data must be independent of one another ideally
  • Simple representation without oppurtunities of hyperparameter tuning
  • Not suitable for big datasets



Logistic Regression
Strengths
  • Works well with correlated features
  • There are many ways to regularize such a model so as to to avoid overfitting of data
  • Unlike SVMs we can easily take in new data for training using online gradient descent
Weakness
  • Requires much more data to achieve good accuracies


Support Vector Machines (SVMs)
Strengths
  • Kernel Trick: Users can build in expert knowledge about the problem via engineering the kernel
  • SVMs have regularization parameters to tolerate some errors and avoid over-fitting
  • SVMs might be more robust even if the training samples have some bais
Weakness:
  • High computational costs
  • Users might need to have domain knowledge to use kernel functions


Decision Tree
Strengths
  • Decision Trees implicitly perform variable screening or feature selection
  • Decision Trees requires relatively little effort from users for data preperation
  • Nonlinear relationships between parameters do not affect tree performance
Weakness
  • Decision Trees are extremely sensitive to small pertubations in the training data. A slight change can result in a drastically different tree.
  • They can easily overfit. Even though this can be prevented by validation methods and pruning ,but a lot of research still needs to done in this area
  • If two features explain the same thing a decision tree only takes the best of those and neglects the other feature whie many other learning algorithms consider both of them. In such a way a decision tree might not be able to use all the available good features in a data

Ensemble Methods
Strengths
  • Ensemble methods average out bais
  • They help in reducing the variance
  • They are unlikely to overfit
Weakness
  • Difficult to learn correlation between classifiers from different types of learners
  • Learning time and memory constraints might be high
  • Learned concept might be difficult to understand