I have been going through the deep learning literature for quite some time now. I have also participated in a few challenges to get my hands dirty.
But what I enjoy the most is to apply deep learning in a real life problem. A real life problem which encompasses my daily life. This is partly why I picked up this problem of chair count recognition, to finally solve a problem which was unsolved till now!
In this article, I will cover how I defined the problem. I will also mention what were the steps I took to solve the problem. Consider it as a raw uncut version of my experience as I tried to solve the problem 🙂
Let me provide a bit of background about why I wanted to count chairs in a photograph.
At Analytics Vidhya, we usually have 10-15 people in the office. But in summers, interns crowd our habitat. So, if we have to do an all team meetings in Summers – we end up pulling chairs from all other rooms
Given my laziness, I thought – what if there was an algorithm which could suggest us which room has an unoccupied chair? This will save us the hassle of going from one room to the other in search of a chair.
This seemed to be a simple and mundane enough problem, but I saw it as a chance to try out my newly acquired prowess! Can deep learning be used to solve this problem? Well, honestly I don’t know how much it could help, but no harm in trying it out right?
Now you know what the problem was, let me explain to you my thought process in solving the problem. We can break down the problem into four tasks –
I decided that I should go from a comparatively easy problem to a more complex problem to reach my goal. That is the reason I divided the problem into these specific four tasks. In this article, I will cover how I attempted the first two tasks and then in the subsequent article, I will show you my attempts for the next two problems.
The first and the simplest task for our problem is to find out whether we have a chair in the picture clicked in a room. As of now, I simplified the problem by ignoring the need of video feed by manually taking pictures of the room.
For example, if I give you two images, can you tell which one is of a chair?
If you have guessed correctly, it’s the first one. So how did you guess it?
You have probably seen a chair so many times that it is not difficult for you to infer if there is a chair in the image or not. In short, you have prior knowledge of what a chair looks like in reality. Similarly, we can have a trained artificial neural network which can do the exact thing for us.
By the way, we choose to use artificial neural network over other algorithms because right now, Neural nets are the most powerful and state-of-the-art techniques for solving image processing problems.
So what I did was, I took an out-of-the-box pre-trained neural network and applied it to these images. This network was previously trained on ImageNet dataset, which has an assortment of all sorts of classes that are found in the reality.
But there was an issue when I let the model recognize an object in the image. It could not correctly classify what object was present in the image. For example, here is an output for the image given below
[[('n03179701', 'desk', 0.56483036), ('n03337140', 'file', 0.14689149), ('n04550184', 'wardrobe', 0.03918023)]]
On the contrary, it predicted that the image contained a desk rather than a chair. This seemed disheartening because a desk and a chair have very few similarities. A desk is much broader in shape than a chair.
As mentioned in m previous article, whenever I encounter a problem when building neural networks, I go through a stepwise approach to tackle the issue. I’ll just list down the steps:
Step 1: Check the architecture
Step 2: Check the hyper-parameters of neural network
Step 3: Check the Complexity of network
Step 4: Check the Structure of Input data
Step 5: Check the Distribution of data
Here after evaluation, I found that the image input I was giving to the model was incorrect. I was not properly handling the aspect ratio of the image. So to take care of this, I added a custom code which was mentioned in one the keras’s issues on github. The updated image looked like this.
After taking care of the issue, the model started working correctly and giving out right results.
[[('n02791124', 'barber_chair', 0.77817303), ('n03179701', 'desk', 0.090379775), ('n03337140', 'file', 0.033129346)]]
Now that we have recognized that our image contains a chair, the next step was to identify where in the image is the chair present. Along with the chair, we also have to recognize and identify a person in the image. We need to identify a person to discern the occupancy of the chair. Both of these tasks (task 2 and task 3 respectively) will help us to solve a much bigger task of finding out if the chair is occupied or not.
For this too, as with the previous task, we will use a pre-trained network which will give us an acceptable score out-of-the-box. For object detection, currently, YOLO network is one the best models which gives a great performance in real time. I have covered a bit about YOLO and how it works in this article. Let us look at how we can leverage this to solve our problem.
To setup YOLO in the system, the following simple steps can be followed:
Step 1:
git clone https://github.com/pjreddie/darknet
cd darknet
make
Step 2:
wget https://pjreddie.com/media/files/yolo.weights
Now to run this to solve our problem, you have to type the below command and give the location of your own image
./darknet detect cfg/yolo.cfg yolo.weights ../../data/image.jpg
After applying YOLO on our images, I saw that it gave pretty good results. Let me show you some examples of what it can do.
Although we have a decent start, there are still some issues which would hinder the deployment of the project as a full-fledged product. I will list down a few of them:
The YOLO model still made some mistakes, i.e. it was not a 100% accurate model. For example, in the image below; even a dustbin is categorized as a person!
What if in an image, a chair obstructs the view of another chair? Would our algorithm be able to identify the hidden chair? This is a point to ponder upon.
Along with these issues, there are some more practical implementation details, like how much time does our algorithm take to recommend a solution, what kind of hardware does it require to run etc. These all things are certainly to be considered before selling our algorithm as a product!
Also, as I said earlier that we have only considered the first two tasks and haven’t touched upon the next two tasks. Our next steps would be to identify the count of chairs in the room and then build an end-to-end product.
In this article, I described my personal experience of solving a real life problem. This article covers object detection and recognition in an image; the object specifically being a chair. For recognition, we used a simple pre-trained model for predicting the object in an image. On the other hand, for detection, we used YOLO, which is a state-of-the-art real time technique for object detection.
I will continue on with chair count in the next part of the article, where we will cover how to calculate the count of chairs. I hope this will help you solve your own problem someday. Good luck!
Sir, I'm a have taken one year drop to prepare for IIT but, I want to make my career in designing programs for machine learning. So sir, please guide me how should I prepare from now to get to my dreams to reality. I will be very thankful.
You can start right now if you want. There's a learning path specially designed for people getting started in Data science (link: https://www.analyticsvidhya.com/blog/2017/01/the-most-comprehensive-data-science-learning-plan-for-2017/ ). Try for internships and pursue technical projects. Nothing is stopping you from doing that. You can even do little projects like the one shown in this article. Good luck for your endeavors!
Hi Faizan, good to see you using YOLO for chair detection. based on my experience to find objects than just chair, i used YOLO with tensorflow.. I just tried out the sample that they have given and observed that PASCAL dataset is being used.. Am unsure which dataset are you using.. They have 2 different versions.. one is 2007 and another one is 2012. If you use 2007, try use 2012.. Even if this is not helping you, try COCO dataset for the problem 1. If YOLO is not giving you good performance, Google released a new api for object detection. Have a look on that too. Let me know your thoughts
Thanks for the insights soorya. I will surely look into this when I build the next steps for the model
A suggestion. Have you thought of taking photos of the room with or without chairs and then set your model to train over this set of photos?
I initially had that idea too, but the size of dataset which I create will be small in comparison to what yolo is originally trained on. Still, I think that dataset would be helpful in finetuning the model