A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.
While there exists many planes separating the given data, SVM outputs the optimal hyperplane dividing the two dataset. The optimal hyperplane gives the largest minimum distance to the training examples.
Given a hyperplane how will we make a decision rule so as to use this decision boundary?
Lets consider a vector W which is perpendicular to the hyperplane starting from origin. Let U denote the vector joining the origin and the unknown point. Lets consider a projection of this unknown vector U on W given my U.W
If this projection is small the unknown point will lie in one class and else it will lie on the other class.
i,e if U.W + B>0 => it is a positive sample
if U.W + B<0 => it is a negative sample.
But this does alone does not give us enough information to find out W and B.
Lets think of a way to do that.
Let X+ denote a positive sample and X- denote a negative sample.
Lets assume that W.X+ + B >= 1
Also W.X- +B <= -1
Let us denote Yi = +1 for positive samples and -1 for negative samples
Multiplying the above two equations with Yi gives
Yi(W.X+ B)>=1
Yi(W.X+ B)>=1 both the equations turns out to be the same because of introducing the extra variable.
Yi(W.X+B) -1 >=0
Yi(W.X+B) -1 =0 for the nearest points to the decision plane. ---> 1
Consider the two data points X- and X+ lying closest to the decision boundary.
The width of the decision boundary is given by (X+ - X-).W/||W||
From Equation 1 it simplifies to 2/||W||
Our aim is to maximize this width and hence minimize ||W|| with the above constraints.
Its is also equivalent to minimizing .5||W||^2
Solving with the help of Lagrange Multipliers . On solving we get
W=Sum (alpha*Yi*Xi) ----> (2)
Sum(alpha*Yi) = 0 --->(3)
These equations can be passed into an optimization frame work we can get the optimal values of W and B
Assuming that we have some data which is not linearly separable, we need a transformation thats makes the data linearly separable.There are various kernels that can be used for performing such a transformation.
Now lets talk about something different . Let discuss a feature descriptor procedure called Histogram of Oriented Gradients (HOG). HOG feature descriptors are very frequently used in object detection and today we will be using HOG along with SVMs to Object Detection.
First of all lets begin by asking the question , what is a feature descriptor?
A feature descriptor is a representation of the image by taking only useful information from an image for a particular task. Not that its not a compression technique as we will not be able to recreate an image from its descriptors.
Procedure to calculate the HOG of an Image
1. Calculate the gradient image using the Sobel X and Sobel Y operators
2. Divide the image into small cells (eg: 8*8). Calculate the Histogram of Gradients (based on direction and magnitude of the gradient).
3.Take an average over the Histogram over adjacent neighbor cells in order to obtain a better shape property.
4.Concatenate all the feature vectors to obtain a single dimensional vector showing containing the HOG feature.
The resulting feature vector of various images can be trained using a SVM to obtain a classifier in order to distinguish between various images.
I have used HOG features along with SVM to detect Treble clefs in a Sheet Music.
There were hardly any false positives but some clefs were not detected. I hope to resolve it either by using more data or building multiple classifiers.
The image shows one of the most perfect results obtained by the program. I hope to use this code to make a complete OMR system.
While there exists many planes separating the given data, SVM outputs the optimal hyperplane dividing the two dataset. The optimal hyperplane gives the largest minimum distance to the training examples.
Given a hyperplane how will we make a decision rule so as to use this decision boundary?
Lets consider a vector W which is perpendicular to the hyperplane starting from origin. Let U denote the vector joining the origin and the unknown point. Lets consider a projection of this unknown vector U on W given my U.W
If this projection is small the unknown point will lie in one class and else it will lie on the other class.
i,e if U.W + B>0 => it is a positive sample
if U.W + B<0 => it is a negative sample.
But this does alone does not give us enough information to find out W and B.
Lets think of a way to do that.
Let X+ denote a positive sample and X- denote a negative sample.
Lets assume that W.X+ + B >= 1
Also W.X- +B <= -1
Let us denote Yi = +1 for positive samples and -1 for negative samples
Multiplying the above two equations with Yi gives
Yi(W.X+ B)>=1
Yi(W.X+ B)>=1 both the equations turns out to be the same because of introducing the extra variable.
Yi(W.X+B) -1 >=0
Yi(W.X+B) -1 =0 for the nearest points to the decision plane. ---> 1
Consider the two data points X- and X+ lying closest to the decision boundary.
The width of the decision boundary is given by (X+ - X-).W/||W||
From Equation 1 it simplifies to 2/||W||
Our aim is to maximize this width and hence minimize ||W|| with the above constraints.
Its is also equivalent to minimizing .5||W||^2
Solving with the help of Lagrange Multipliers . On solving we get
W=Sum (alpha*Yi*Xi) ----> (2)
Sum(alpha*Yi) = 0 --->(3)
These equations can be passed into an optimization frame work we can get the optimal values of W and B
Assuming that we have some data which is not linearly separable, we need a transformation thats makes the data linearly separable.There are various kernels that can be used for performing such a transformation.
Now lets talk about something different . Let discuss a feature descriptor procedure called Histogram of Oriented Gradients (HOG). HOG feature descriptors are very frequently used in object detection and today we will be using HOG along with SVMs to Object Detection.
First of all lets begin by asking the question , what is a feature descriptor?
A feature descriptor is a representation of the image by taking only useful information from an image for a particular task. Not that its not a compression technique as we will not be able to recreate an image from its descriptors.
Procedure to calculate the HOG of an Image
1. Calculate the gradient image using the Sobel X and Sobel Y operators
2. Divide the image into small cells (eg: 8*8). Calculate the Histogram of Gradients (based on direction and magnitude of the gradient).
3.Take an average over the Histogram over adjacent neighbor cells in order to obtain a better shape property.
4.Concatenate all the feature vectors to obtain a single dimensional vector showing containing the HOG feature.
The resulting feature vector of various images can be trained using a SVM to obtain a classifier in order to distinguish between various images.
I have used HOG features along with SVM to detect Treble clefs in a Sheet Music.
There were hardly any false positives but some clefs were not detected. I hope to resolve it either by using more data or building multiple classifiers.
The image shows one of the most perfect results obtained by the program. I hope to use this code to make a complete OMR system.