Self made Computer Vision Researcher: 2017

Thursday, 20 July 2017

Power of Ensembling

A collection of various models can result in much better model than each of the individual models. This is the basic idea behind ensembling . It is very popular in machine learning competitions and really helps in winning competitions in Kaggle.
Lets understand the power of a combined model with some examples.

Let say we have a test set of 10 samples and the ground truth is all positive.
1111111111
We have three machine learning classifiers which randomly predicts 80% of the data accurately, namely X,Y and Z

For a majority vote with three members we can expect 4 outcomes

1.All classifiers predict correctly
=0.8*0.8*0.8=0.512

2.Two are correct
=0.8*0.8*0.2 +
   0.8*0.2*0.8 +
   0.2*0.8*0.8
                      =0.384

3.One is correct
= 0.8*0.2*0.2 +
    0.2*0.8*0.2 +
    0.2*0.2*0.8
                     = 0.096
4.None correct
=0.2*0.2*0.2 =0.008

Hence the probability that the majority is correct =0.512+0.384=0.896

This is an almost 10 % increase in the accuracy. Practically its much more easier to train three different models individually having accuracies of 80% than to train a single model having an accuracy of 90 percent.

The key factor to be noted here is that the individual models should be least correlated with each other as possible. This will if the models are highly correlated then the above math wont hold and there wont be a substantial improvement in the accuracy of the ensemble.

Monday, 3 July 2017

A linear space implementation of XOR using Decision Trees (Decision Graphs)

Hi guys
Until now everyone thought an implementation of XOR using decision Trees are bad since it would require exponential memory usage. Well I think if we model these using decision Graphs it can be much more efficient

A tree based implementation requires Exponential Space as shown

A Graph based implementation only requires linear space and is hence much better even for humans for computing

Wednesday, 21 June 2017

A comaritive Study of Some Supervised Learning Models

Gaussian Naive Bayes

Strengths

Less amout of data is required compared to other descriminative models like Logistic Regression

Weakness

Data must be independent of one another ideally
Simple representation without oppurtunities of hyperparameter tuning
Not suitable for big datasets

Logistic Regression

Strengths

Works well with correlated features
There are many ways to regularize such a model so as to to avoid overfitting of data
Unlike SVMs we can easily take in new data for training using online gradient descent

Weakness

Requires much more data to achieve good accuracies

Support Vector Machines (SVMs)

Strengths

Kernel Trick: Users can build in expert knowledge about the problem via engineering the kernel
SVMs have regularization parameters to tolerate some errors and avoid over-fitting
SVMs might be more robust even if the training samples have some bais

Weakness:

High computational costs
Users might need to have domain knowledge to use kernel functions

Decision Tree

Strengths

Decision Trees implicitly perform variable screening or feature selection
Decision Trees requires relatively little effort from users for data preperation
Nonlinear relationships between parameters do not affect tree performance

Weakness

Decision Trees are extremely sensitive to small pertubations in the training data. A slight change can result in a drastically different tree.
They can easily overfit. Even though this can be prevented by validation methods and pruning ,but a lot of research still needs to done in this area
If two features explain the same thing a decision tree only takes the best of those and neglects the other feature whie many other learning algorithms consider both of them. In such a way a decision tree might not be able to use all the available good features in a data

Ensemble Methods

Strengths

Ensemble methods average out bais
They help in reducing the variance
They are unlikely to overfit

Weakness

Difficult to learn correlation between classifiers from different types of learners
Learning time and memory constraints might be high
Learned concept might be difficult to understand

Wednesday, 7 June 2017

CreateMelody

Hi folks,
Today I brought my first domain name, createmelody.com . The initial Computer Vision part of my proposed Music entry software is almost done. I am working on the web application as well am learning Web Development. Feeling really optimistic about the proposed software. I am also having learning a lot about Artificial Intelligence and deep learning.

Tuesday, 30 May 2017

Text Detection Using Tesseract and OpenCV

Tesseract is a freely available open source Optical Character recognition tool which can be used for text detection in images. Tesseract's development has been sponsored by Google is the best Open Source Optical Character Recognition available for free. Today I will be using tesseract for detecting text in Sheet Music and removing it so as to enhance OMR software that I am building. The results obtained were incredible and shows a lot of promise.Here are a few of the results.


Source Image

Text Removed Image

I had used a lot of other conditions in addition to using tesseract since tesseract alone was showing detections inside the staves. I have obtained quite good results for almost all the data that I tested . I guess this gives myself a lot of motivation to proceed with this project

Thursday, 18 May 2017

Using Optical Flow for Depth Estimation

Assume a stationary scene with the camera moving around. A typical example would be the view from a window in a train Journey. If you notice properly closer objects move faster compared to distant objects . This can be used as a basic property to estimate depth for the case of videos. Optical Flow can be used to estimate velocity at each point in a video and this velocity can be used for a relative depth estimation. OpenCV gives a very good implementation of Optical flow and hence we can obtain a fairly accurate depth map using optical flow.

A very appropriate explanation of Optical Flow using Lucas-Kanade Method is explained here. While Lucas-Kanade Method is useful for finding optical flow for a sparse set of points, we are actually interested in calculating Optical FLow vectors for the entire image, i.e for every pixels. There exists a lot of good implementations of dense optical Flow in OpenCV.

Even though I the title of this blog was on depth estimation using Optical Flow, I was unable to obtain any datasets with a moving camera and a stationary background. I would be updating this post with the depth map created with the help of Optical flow on a later time.

Wednesday, 10 May 2017

Object Detector using SVM and HOG

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.

While there exists many planes separating the given data, SVM outputs the optimal hyperplane dividing the two dataset. The optimal hyperplane gives the largest minimum distance to the training examples.

Given a hyperplane how will we make a decision rule so as to use this decision boundary?

Lets consider a vector W which is perpendicular to the hyperplane starting from origin. Let U denote the vector joining the origin and the unknown point. Lets consider a projection of this unknown vector U on W given my U.W
If this projection is small the unknown point will lie in one class and else it will lie on the other class.
i,e if U.W + B>0 => it is a positive sample
   if U.W + B<0 => it is a negative sample.

But this does alone does not give us enough information to find out W and B.
Lets think of a way to do that.
Let X+ denote a positive sample and X- denote a negative sample.

Lets assume that W.X+ + B >= 1
Also W.X- +B <= -1

Let us denote Yi = +1 for positive samples and -1 for negative samples
Multiplying the above two equations with Yi gives
Yi(W.X+ B)>=1
Yi(W.X+ B)>=1   both the equations turns out to be the same because of introducing the extra variable.

Yi(W.X+B) -1 >=0

Yi(W.X+B) -1 =0 for the nearest points to the decision plane.   ---> 1

Consider the two data points X- and X+ lying closest to the decision boundary.
The width of the decision boundary is given by (X+ - X-).W/||W||

From Equation 1 it simplifies to 2/||W||

Our aim is to maximize this width and hence minimize ||W|| with the above constraints.
Its is also equivalent to minimizing .5||W||^2

Solving with the help of Lagrange Multipliers . On solving we get
W=Sum (alpha*Yi*Xi) ----> (2)
Sum(alpha*Yi) = 0 --->(3)
These equations can be passed into an optimization frame work we can get the optimal values of W and B

Assuming that we have some data which is not linearly separable, we need a transformation thats makes the data linearly separable.There are various kernels that can be used for performing such a transformation.

Now lets talk about something different . Let discuss a feature descriptor procedure called Histogram of Oriented Gradients (HOG). HOG feature descriptors are very frequently used in object detection and today we will be using HOG along with SVMs to Object Detection.
First of all lets begin by asking the question , what is a feature descriptor?
A feature descriptor is a representation of the image by taking only useful information from an image for a particular task. Not that its not a compression technique as we will not be able to recreate an image from its descriptors.

Procedure to calculate the HOG of an Image
1. Calculate the gradient image using the Sobel X and Sobel Y operators
2. Divide the image into small cells (eg: 8*8). Calculate the Histogram of Gradients (based on direction and magnitude of the gradient).
3.Take an average over the Histogram over adjacent neighbor cells in order to obtain a better shape property.
4.Concatenate all the feature vectors to obtain a single dimensional vector showing containing the HOG feature.

The resulting feature vector of various images can be trained using a SVM to obtain a classifier in order to distinguish between various images.

I have used HOG features along with SVM to detect Treble clefs in a Sheet Music.
There were hardly any false positives but some clefs were not detected. I hope to resolve it either by using more data or building multiple classifiers.

The image shows one of the most perfect results obtained by the program. I hope to use this code to make a complete OMR system.

Tuesday, 9 May 2017

Hough Transform

Today I will be explaining a very popular algorithm in the domain of Computer Vision known as the Hough Transform.I will start by explaining how the algorithm works and then I would be showing a sample implementation using this algorithm for detection of straight lines in Sheet Music.
So lets get started
What does Wikipedia tells you?
The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

What is meant by a parameter space?

Let me explain it by considering the case of a line.
In 2D the equation of a line can be written as
y=mx+c

Here the parameters of the line are m and c as a different
values of the parameters will give a different line in the XY plane

lets rewrite this equation as c=-mx+y
Now we can visualize the same as point in MC plane with x and y as the parameters.

This is known as the parametric space in Hough transform.
So what is the advantage of using such a space ?

Each point in an XY plane can be mapped to a line in the parametric domain and each point in the parametric space can be mapped into a line in the actual XY domain. So consider two points in the XY plane . Consider the corresponding lines in the parametric space.The point of intersection gives the M and C values for the equation of the line joining these two points.This is the main idea used behind all the Hough transform. But there is a problem in this case of vertical lines. Hence for obtaining a generalized solution we use another parametric representation of a line

i.e r=Xcos(theta) + Ysin(theta)

Now our task is to figure out if there is a line in a given image. Assume that the image is a binary images with either zeros or ones. For each point in the image we can draw a line in the parametric space.

We can define the parametric space with a double dimensional array and increment the those cells through which a line passes. If the points do in fact lie in a line, then the cell corresponding to that parameter (r,theta)will have a huge value compared to other cells . Hence we can get the exact equation of the line.

The same technique can be extended for any curve which can be represented in a parametric way. Thus we can find if some points do in fact lie on a particular curve.

Lets analyze the complexity of this method. Consider that we have N points . For each point we have to update a set of points in the parameter space. But we can choose to to much we have to split the parametric space and this can be under our control. Finally we have to find the maximum in a two dimensional array to obtain the required parameters. Hence the overall complexity is of O(N) . The constants can be quite large though.

This is the result of applying hough line transform on a sheet music. Even though there are multiple lines for a particular stave line those can be avoided by using a clever usage.

Thanks for reading , and have a great day

Monday, 8 May 2017

Expectation Maximization Algorithm

An expectation maximization is an iterative method to find maximum likelihood of parameters in statistical models. The EM algorithm is very similar to that of K-Means but uses a probability value instead of a hard cutoff. Today I will be showing on how to use EM algorithm to initialize a Mixture of Gaussian (MoG) . Specifically I would be using the EM algorithm to categorize the histogram of an Image as a MoG. For simplicity assume that the image histogram can be approximated by two Gaussian distributions.

For Instance consider the following figure. The histogram of this image is plotted below

The histogram shows a peak for the background as well as a peak for the skin color near 200. These two distributions can be approximated as a Gaussian and EM algorithm can be used to calculate the means and the variances of these distributions.

Once the means and the variances are computed those values can be used to threshold a particular color alone . Closed contours can also be used to extract face alone from the thresholded region. An implementation of EM algorithm from OpenCV's ml module was used directly and the resulting means were found at 0 and 136. The resulting standard deviation was found to be 0 and 83. This abnormal value of standard deviation indicate that the distribution cannot be modeled properly a Mixture of Gaussian with two modes.

So an attempt was made to model the distribution with three modes.The means were obtained at 200,46 and 0 with a Standard Deviation of 22.4,0 and 46.55 . This is a much more better fit for the histogram and hence 3 modes where taken.

Now lets try thresholding the image by a threshold given by mean-S.D=200-22=178

The white region corresponds to the skin color in the initial image.

Typically we would not be using EM algorithms for solving these problems since the number of points are quite large. Still this example illustrates the power of EM algorithm

The associated code can be found here

Sunday, 7 May 2017

Calculating Extrinsic Camera Matrix

Most people doing research in Computer Vision would be aware of the Camera Matrix's .i,e the Intrinsic and the Extrinsic Matrix. There is a huge amount of code available which calibrates the intrinsics matrix from chess board images. So given a new camera its quite easy to calibrate the intrinsic matrix. Extrinsic Matrix on the other hand vary with individual images and contains the details of the position and orientation of the camera.
Given this much information I always felt it difficult on how to exactly calculate the camera Extrinsic provided we know the exact orientation and pose of the camera in the actual world. Today I found this article which was extremely useful for me to understand this matrix properly .

Let C be a column vector describing the location of the camera-center in the world coordinates.Let Rc be the rotation matrix denoting the cameras orientation with respect to the world coordinates.

Using these relationships we can compute the camera extrinsic matrix .

Camera Extrinsic Matrix are required in creating Multi-view Stereo Datasets

Saturday, 6 May 2017

Volumetric Reconstruction

Most of my day was spend experimenting on Volumetric Reconstruction. I was able to get a code working mainly for my Course project. I was unable to obtain as good results as the paper promised . Also the code took a lot of time even when I ran it on GPUs. I am still wondering at why reconstruction is taking this much of computational power.