After a wait of 3 long hours, it was my turn to enter the interview room. The first question asked to me by the interviewer was “Can you estimate the total number of cigarettes consumed per month in India?”. Having worked on a project for ITC in one of the core courses, I was able to crack the problem with relative ease. I started with the total number of factories of ITC in India. From there, I calculated the number of cigarettes manufactured by ITC in a year with the help of average turnover time. Further, I made good guesses on the % of cigarettes exported and the % share of ITC in India. Finally, I got the number of cigarettes consumed per month in India which convinced the panel.
Questions like these are very common in analytics and management consulting interviews. If you wish to appear for companies of this genre, you should be able to solve guess estimates (or guesstimates as we’ll call them from here on) in double-quick time. And hence this article will be very useful. I was fortunate to have got this puzzle. What if I had no clue on the number of ITC factories producing cigarettes?
After this interview, I tried solving many such puzzles to get a comfort level with such problems. In this article, I will walk through some techniques I now use to crack such puzzles.
Guesstimates are one part of the entire data science interview process. We have penned down a comprehensive 7-step framework just for you, in our ‘Ace Data Science Interviews‘ course. Come and learn the various aspects, tips and tricks to crack your next data science interview!
What does an interviewer evaluate using a guesstimate case study?
Very often in the role of an Analyst or Consultant, clients expect quick initial scaling or sizing of potential projects. This is the reason such questions are so common in interviews for recruitment of such roles. The interviewer is looking out for four key traits in this interview.
- How structured is your approach?
- How comfortable are you with numbers?
- Are you able to make quick checks on the efficiency of different methods?
- Can you do back of the mind calculations and validate the magnitude of numbers?
Framework to solve a guesstimate problem
Knowledge of certain techniques used for such guess estimates helps keeping the approach structured in the interview. Let’s address the cigarette estimate problem from the demand side (without using the number of ITC factories) while discussing the key techniques. Following are the 4 key techniques which will help you in such case interviews :
- Find the right proxy: This is by far the most important technique. The proxy is a parameter which behaves in a similar manner as the dependent parameter. In the cigarette estimation problem, the population of India is a good proxy for the number of cigarette consumed monthly in India. If the population of India increases, it can be safely said that cigarette consumption will increase proportionally. Other proxies used is the growth in population, growth in demand of a newly introduced technology, average number of planes parked at major airports etc.
- Segment till you can find differentiated clusters : Estimating parameters on a segment level is far more accurate than making guesses on the overall population. In the cigarette estimation problem, population below 16 years can safely be ignored for cigarette consumption and female population is expected to have a lower average cigarette consumption than male population. This is how segmentation helps making accurate assumptions.
- o smart calculations and number round off : Speed is very critical in such problems and one needs to maintain a balance between accuracy and time consumption. Say you need to fin 2999/3. It is much easier to calculate 3000/3 than 2999/3. In such cases right the answer as 1000 (-) . This indicates the number is slightly lesser than 1000 and can be compensated in further calculations.
- Validate number magnitude : It is always a good idea to keep on validating intermediate numbers using your experience and sense checks.
Some ground rules to be followed while doing a guesstimate
Following are some factors one should keep in mind while solving a guess estimate problem :
- Analyze all possible uses of the subject. For example, while estimating the number of tennis balls in India, one should consider balls being used in tennis, cricket and all other sports which are potential users of tennis balls.
- Keep population of your country, state and city on finger tips. As population is the most common proxy for many case studies, such numbers give a good starting point.
- Have a look on some key parameters for airline management : Many of guess estimate problems are related to airlines. A sense on the number of flights which normally stays in major airports, time lag between flight take off etc. helps.
- Draw neat diagrams to show the segmentation. This not only helps do calculations quickly but also makes it easier to redo the calculations on the segment level if required.
- Don’t do round off in the same direction. Such round off magnifies the error term. Putting a sign in front of rounded off number helps.
Step-by step-approach for solving a guesstimate problem
Case 1: Estimate the number of cigarettes consumed monthly in India
Solution: A good proxy in such a problem is the population of India, i.e., 1.2 billion. Following is an effective way to segment this population:
Following were the key considerations in building the segmentation and the intermediate guesses:
- The rural population consumes far lesser cigarettes than urban because of the purchasing power difference.
- Male consume more cigarettes than female in both urban and rural populations.
- Children below 16 years consume a negligible number of cigarettes.
- Male to Female ratio in Urban is closer to 1 than that of Rural.
- Male to Female ratio in younger generations is closer to 1 than that of older. This is because of the increase in awareness level.
- Bulk of population start smoking after getting into a job and hence the average number cigarettes are higher in older groups.
- Total number of cigarettes from the supply side also come to around 10 Trillion, which gives a good sense check on the final number.
Case 2: Estimate the number of WhatsApp Android applications installed
Solution: A good proxy in this problem is the world population, i.e., ~7.2 Billion. Following is a possible approach to this problem:
The actual number of Whatsapp installed on Android phone is slightly more than 100 Million. As can be seen from this example that guess estimates can be fairly accurate if we choose good segments and approximations.
Case 3: Estimate the number of tennis balls bought in India per month
Solution: A good proxy in this problem is the number of cities in India i.e. ~1700. The catch in this problem is to analyze where all can we use tennis balls. Once we have the number of tennis balls used monthly, we can easily find the number of tennis ball bought in a month using the lifetime of tennis balls.
Following is an effective way to segment this population:
Following were the key considerations in building the segmentation and the intermediate guesses:
- Rural areas have negligible number of tennis courts.
- Metro cities have the highest number of sectors.
- For each sectors in metro cities, the number of grounds for both tennis and cricket is higher. This is both because of the bigger area and the higher buying capacity in metros.
- Number of balls consumed in metros per ground is higher because of the higher engagement in metros.
A challenge for the reader
Here is a practical example you can give a shot. Imagine you sitting in an interview and the interviewer asks “Estimate the number of aircrafts in air across the globe at this moment in time.” How will you answer this question ? Write down your approach in the comment box below to get opinion from experts.
End Notes
Guess estimates are one of the most common case studies asked in data science interviews. With the right tools and techniques, this case study becomes a cake walk.
Did you find the article useful? Share with us any other techniques you incorporate while solving a guess estimate problem. Do let us know your thoughts about this article in the comment section below.
Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.
This one is really helpful, as these type of questions are very frequent in Analyst interviews. Great work!!!
Hi Tavish, I liked the way you have presented your points in the article. Its very well organized and makes some interesting observations (especially about the closest proxy). However, I must say that I disagree with you on the relevance of the guesstimate problems for an analytics professional. The main objective of the entire analytics endeavor is to take "guessing" out of the game. I am not sure what questions are being asked in Analytics Consulting interviews in India, but I am sure there are better ways to evaluate an individual's structured approach to problem solving, his/her comfort with numbers (in a statistical sense as opposed to raw arithmetic) and his/her ability to evaluate the efficiency of different algorithms based on the problem at hand. IMHO, Tim Peters said it best when he wrote one of the tenets in the Zen of Python: "When faced with ambiguity, resist the temptation to guess" Guesstimate questions have traditionally been used by Management Consulting companies to test the candidates ability to think on their feet. In contrast, all the Data Scientists/Analysts that I have met are way more critical and deliberate in their thinking. I think that is one of the most valuable traits of an analytics professional. But by all means, this is my personal opinion. Best, Ayush
Ayush, Thank you for your elaborate comment. I agree with you that Guess estimates were traditionally used by Management Consulting companies. They still are equally important for them. For analytics companies in India they have become quite popular in recent past. I say this based on my experience and the conversation I had with people recruiting day in and day out in analytics. The reason they are so popular in Indian analytics companies is that analytics is still in its nascent stage. New hires have to make their own path and influence people in industry who are still hesitant to implement strategies driven on numbers. To access such capability we need people with skills very close to a management consultant, where business problems are not very well defined and client is not very keen on accepting fact based strategy changes. Also expertise in such problems gives candidate a comfort level with segmentation, which is the heart and soul of analytics industry. This is my perception of the Indian analytics industry. I am still open to discussion on the relevance of such case studies in analytics interviews. Talking from my personal experience, I have been asked such question in every interview I have appeared till date. Tavish