Thursday, 20 July 2017

Power of Ensembling

A collection of various models can result in much better model than each of the individual models. This is the basic idea behind ensembling . It is very popular in machine learning competitions and really helps in winning competitions in Kaggle.
Lets understand the power of a combined model with some examples.

Let say we have a test set of 10 samples and the ground truth is all positive.
1111111111
We have three machine learning classifiers which randomly predicts 80% of the data accurately, namely X,Y and Z

For a majority vote with three members we can expect 4 outcomes

1.All classifiers predict correctly
=0.8*0.8*0.8=0.512

2.Two are correct
=0.8*0.8*0.2 +
   0.8*0.2*0.8 +
   0.2*0.8*0.8
                      =0.384

3.One is correct
= 0.8*0.2*0.2 +
    0.2*0.8*0.2 +
    0.2*0.2*0.8
                     = 0.096
4.None correct
=0.2*0.2*0.2  =0.008


Hence the probability that the majority is correct =0.512+0.384=0.896

This is an almost 10 % increase in the accuracy. Practically its much more easier to train three different models individually having accuracies of 80% than to train a single model having an accuracy of 90 percent.

The key factor to be noted here is that the individual models should be least correlated with each other as possible. This will if the models are highly correlated then the above math wont hold and there wont be a substantial improvement in the accuracy of the ensemble.


Monday, 3 July 2017

A linear space implementation of XOR using Decision Trees (Decision Graphs)

Hi guys
Until now everyone thought an implementation of XOR using decision Trees are bad since it would require exponential memory usage. Well I think if we model these using decision Graphs it can be much more efficient

A tree based implementation requires Exponential Space as shown


A Graph based implementation only requires linear space and is hence much better even for humans for computing




Wednesday, 21 June 2017

A comaritive Study of Some Supervised Learning Models


Gaussian Naive Bayes
Strengths
  • Less amout of data is required compared to other descriminative models like Logistic Regression
Weakness
  • Data must be independent of one another ideally
  • Simple representation without oppurtunities of hyperparameter tuning
  • Not suitable for big datasets



Logistic Regression
Strengths
  • Works well with correlated features
  • There are many ways to regularize such a model so as to to avoid overfitting of data
  • Unlike SVMs we can easily take in new data for training using online gradient descent
Weakness
  • Requires much more data to achieve good accuracies


Support Vector Machines (SVMs)
Strengths
  • Kernel Trick: Users can build in expert knowledge about the problem via engineering the kernel
  • SVMs have regularization parameters to tolerate some errors and avoid over-fitting
  • SVMs might be more robust even if the training samples have some bais
Weakness:
  • High computational costs
  • Users might need to have domain knowledge to use kernel functions


Decision Tree
Strengths
  • Decision Trees implicitly perform variable screening or feature selection
  • Decision Trees requires relatively little effort from users for data preperation
  • Nonlinear relationships between parameters do not affect tree performance
Weakness
  • Decision Trees are extremely sensitive to small pertubations in the training data. A slight change can result in a drastically different tree.
  • They can easily overfit. Even though this can be prevented by validation methods and pruning ,but a lot of research still needs to done in this area
  • If two features explain the same thing a decision tree only takes the best of those and neglects the other feature whie many other learning algorithms consider both of them. In such a way a decision tree might not be able to use all the available good features in a data

Ensemble Methods
Strengths
  • Ensemble methods average out bais
  • They help in reducing the variance
  • They are unlikely to overfit
Weakness
  • Difficult to learn correlation between classifiers from different types of learners
  • Learning time and memory constraints might be high
  • Learned concept might be difficult to understand

Wednesday, 7 June 2017

CreateMelody

Hi folks,
Today I brought my first domain name, createmelody.com . The initial Computer Vision part of my proposed Music entry software is almost done. I am working on the web application as well am learning Web Development. Feeling really optimistic about the proposed software. I am also having learning a lot about Artificial Intelligence and deep learning.

Tuesday, 30 May 2017

Text Detection Using Tesseract and OpenCV

Tesseract is a freely available open source  Optical Character recognition tool which can be used for text detection in images. Tesseract's development has been sponsored by Google is the best Open Source Optical Character Recognition available for free. Today I will be using tesseract for detecting text in Sheet Music and removing it so as to enhance OMR software that I am building. The results obtained were incredible and shows a lot of promise.Here are a few of the results.


Source Image



Text Removed Image
I had used a lot of other conditions in addition to using tesseract since tesseract alone was showing detections inside the staves. I have obtained quite good results for almost all the data that I tested . I guess this gives myself a lot of motivation to proceed with this project

Thursday, 18 May 2017

Using Optical Flow for Depth Estimation

Assume a stationary scene with the camera moving around. A typical example would be the view from a window in a train Journey. If you notice properly closer objects move faster compared to distant objects . This can be used as a basic property to estimate depth for the case of videos. Optical Flow can be used to estimate velocity at each point in a video and this velocity can be used for a relative depth estimation. OpenCV gives a very good implementation of Optical flow and hence we can obtain a fairly accurate depth map using optical flow.

A very appropriate explanation of Optical Flow using Lucas-Kanade Method is explained here. While Lucas-Kanade Method is useful for finding optical flow for a sparse set of points, we are actually interested in calculating Optical FLow vectors for the entire image, i.e for every pixels. There exists a lot of good implementations of dense optical Flow in OpenCV.

Even though I the title of this blog was on depth estimation using Optical Flow, I was unable to obtain any datasets with a moving camera and a stationary background. I would be updating this post with the depth map created with the help of Optical flow on a later time.


Wednesday, 10 May 2017

Object Detector using SVM and HOG

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.

 While there exists many planes separating the given data, SVM outputs the optimal hyperplane dividing the two dataset. The optimal hyperplane gives the largest minimum distance to the training examples.

Given a hyperplane how will we make a decision rule so as to use this decision boundary?

Lets consider a vector W which is perpendicular to the hyperplane starting from origin. Let U denote the vector joining the origin and the unknown point. Lets consider a projection of this unknown vector U on W given my U.W
If this projection is small the unknown point will lie in one class and else it will lie on the other class.
i,e if U.W + B>0  => it is a positive sample
     if U.W + B<0 =>  it is a negative sample.

But this does alone does not give us enough information to find out W and B.
Lets think of a way to do that.
Let X+ denote a positive sample and X- denote a negative sample.

 Lets assume that W.X+ + B >= 1
Also  W.X- +B <= -1

Let us denote Yi = +1 for positive samples and -1 for negative samples
Multiplying the above two equations with Yi gives
Yi(W.X+ B)>=1
Yi(W.X+ B)>=1   both the equations turns out to be the same because of introducing the extra variable.

Yi(W.X+B) -1 >=0  

Yi(W.X+B) -1 =0 for the nearest points to the decision plane.   ---> 1

Consider the two data points X- and X+ lying closest to the decision boundary.
The width of the decision boundary is given by (X+ - X-).W/||W||

From Equation 1 it simplifies to 2/||W||

Our aim is to maximize this width and hence minimize ||W|| with the above constraints.
Its is also equivalent to minimizing .5||W||^2

Solving with the help of Lagrange Multipliers . On solving we get
W=Sum (alpha*Yi*Xi)   ----> (2)
Sum(alpha*Yi) = 0  --->(3)
These equations can be passed into an optimization frame work we can get the optimal values of W and B

Assuming that we have some data which is not linearly separable, we need a transformation thats makes the data linearly separable.There are various kernels that can be used for performing such a transformation.

Now lets talk about something different . Let discuss a feature descriptor procedure called Histogram of Oriented Gradients (HOG). HOG feature descriptors are very frequently used in object detection and today we will be using HOG along with SVMs to Object Detection.
First of all lets begin by asking the question , what is a feature descriptor?
A feature descriptor is a representation of the image by taking only useful information from an image for a particular task. Not that its not a compression technique as we will not be able to recreate an image from its descriptors.

Procedure to calculate the HOG of an Image
1. Calculate the gradient image using the Sobel X and Sobel Y operators
2. Divide the image into small cells (eg: 8*8). Calculate the  Histogram of Gradients (based on direction and magnitude of the gradient).
3.Take an average over the Histogram over adjacent neighbor cells in order to obtain a better shape property.
4.Concatenate all the feature vectors to obtain a single dimensional vector showing containing the HOG feature.

The resulting feature vector of various images can be trained using a SVM to obtain a classifier in order to distinguish between various images.





I have used HOG features along with SVM to detect Treble clefs in a Sheet Music.
There were hardly any false positives but some clefs were not detected. I hope to resolve it either by using more data or building multiple classifiers.

The image shows one of the most perfect results obtained by the program. I hope to use this code to make a complete OMR system.