Business Case Study Assignments For Entry Level Data Analysts

Chrisx10 Last Updated : 12 Oct, 2024

8 min read

Introduction

Why do companies conduct business case studies along with interviews? Why not just be done with interviews and save time and effort? Companies want to hire data analysts who can apply theoretical principles to solve practical problems, find solutions, and be deductive. Not everyone is deductive, and most people are inductive; they learn from experiences, anecdotes, observations, and patterns. Case studies can gauge the inductive ability of candidates, help identify problem solvers, and make healthy hiring decisions.

In the real world, unlike bootcamps and Coursera mini projects, data is messy, scarce, sporadic, sparse, incoherent, and very rarely clean as expected. When data is clean, it calls for a deeper understanding of where it was sourced from, how it can be leveraged, etc. So companies would ideally want to understand what a data analyst would do in this scenario, where nothing seems right and everything is ambiguous.

Most data analysts solve open-ended problems with multiple open-ended solutions. There is no one right solution; if someone says there is one, it could be out of ignorance and incompetence. Data analysts are expected to 1. Give life to a new analysis(0 to 1), 2. Work on something that’s already existing and enhance it, 3. Repeat the same task daily, weekly, monthly, etc., 4. Solve new problem statements daily, weekly, and monthly. Broadly this converts into four combinations:

Learning Objectives:

Describe the case study. E.g., Business sales, Loans, etc.
Identify the key issues and problems.
Analyze the case study using theoretical principles of statistics and data science.
Recommend the right course of action for each issue.

This article was published as a part of the Data Science Blogathon.

Advantages of Using Case Studies
Guidelines to Solve Case Studies
Sales Leads and Conversions
Problem Statement- Improving the Efficiency of Sales Conversion
Our Recommendation
Useful Resources and References
Conclusion

Advantages of Using Case Studies for Data Analysts

Problem-solving (PS) – PS plays an important role for DS (data scientists)/DA (data analyst)/BA (business analyst). The magnitude of efforts used on PS can vary across organizations and projects, with some teams working mainly on PS about 80% of the time, while others work 20% of the time.
Analytical Tools, Quantitative And/or Qualitative, Depending on the Case – Various techniques and approaches are available, and deciding which method/framework fits the bill also depends on the data analyst.
Decision-making in Complex Situations – With multiple stakeholders churning out multiple solutions, the final say rests with the data analyst. Turning complex situations into easy-to-understand business solutions is one of the key abilities of a data analyst.
Coping with Ambiguities – Most problems that need to be solved are not clear beforehand; sometimes, a lot of effort goes into framing the problem statement before actually trying to solve it using ML models or analytics.

Business case studies are excellent opportunities for candidates to showcase these abilities and stand out from the crowd. Here problem-solving analysts differentiate themselves from task-driven data analysts.

Guidelines to Solve Case Studies for Data Analysts

What is the issue?
1. Identifying the reason for the analysis provides a lot of clarity on how to do the analysis, knowing the issue at hand.
2. Issues can be how to reduce losses(customers loss, growth loss, subscriber loss, monetary loss, etc.), or optimize a metric, for example, sales conversion.
3. How to grow sales conversion by X percent in the next 6 months can also be considered an issue.
4. Any objective that has a positive business impact can be considered.
What is the goal of the analysis?
1. Once the issue is known, the goal is to use data science techniques to solve it.
2. There can be multiple goals as well; for example: in the North, grow sales by X%, and in the South, grow sales by Y% percentage. The same technique can be used to reach these 2 goals.
3. Visualize the end goal in a tabular form or as a flowchart. This will provide a lot of clarity while coding and solving it.
What is the context of the problem?
1. Why solve this problem?
2. What’s the overall impact(for example – 35Cr profit lift for the month of Jan 23)?
3. Which business will be impacted?
4. Who is the stakeholder?
5. What is the budget/cost/investment involved?
What would you recommend — and why?
1. Take a stand, and prove your point, backed with data.
2. Recommendations are practical ideas based on insights.
3. Provide multiple recommendations and provide pros and cons for each.

Finally, even though the final recommendation is important, the approach and reasoning to get there are also equally important.

Sales Leads and Conversions

Source: https://stevepatrizi.com/

The sales funnel a prospective customer’s path to convert into a customer. At the top, there are a lot of users (to be customers), and as the funnel moves down, the size of the funnel reduces. This is where the sales team comes into the picture. They convert these potential customers into customers and subscribers. Sales teams receive information like name, email id, mobile number, gender, age, etc.

But wouldn’t it be better if a conversion score was given, for example, user A has a high potential to convert compared to user B so that the team can focus their time and efforts on user A. Or provide insights on user cohorts that consume least amount of time but have high potential to convert.

Analytics and Data science can answer these questions for the sales team. In this problem statement, a sales team has approached the analytics team to answer and solve some of its pain points.

Problem Statement (PS) – Improving the Efficiency of Sales Conversion

The below PS is an original case study assignment given by an established healthcare start-up for the senior business analyst role.

This PS falls into 0 to 1 and New Problems.

Attached is a dump of slots booked by free users over 2 months with our coaches. The leads are separated by funnel – Bot / Free-Trial and Lead Type – With/without medical condition and India/NRI. All consultations are free, and the objective is to upsell premium subscriptions. Purchase can be identified with the payment time filter. Coaches are divided into Target Classes basis their ability to sell. ‘A’ being the best in sales and ‘D’ being the worst. The data of the actual users and coaches have been masked.

What’s the 3-day and 7-day conversion of different lead types split by funnels?
What hours work best for connectivity and sales?
Come up with insights on how best you would optimize slots, coach, and funnel for the most efficient outcome.
Are there any other insights you can come up with using this data?

Load and summarise data:

health_df =pd.read_csv('https://raw.githubusercontent.com/chrisdmell/Project_DataScience/working_branch/03_healthifyme_case_study/sales_call_data_dump.csv')

print("Shape of the df")
display(health_df.shape)

print("HEAD")
display(health_df.head())

print("NULL CHECK")
display(health_df.isnull().any().sum())

print("NULL CHECK")
display(health_df.isnull().sum())

print("df INFO")
display(health_df.info())

print("DESCRIBE")
display(health_df.describe())

Python Code:

import pandas as pd
from IPython.display import display

health_df =pd.read_csv('''https://raw.githubusercontent.com/chrisdmell/Project_DataScience/working_branch/03_healthifyme_case_study/sales_call_data_dump.csv''')
print("*** Shape of the df ***")
display(health_df.shape)
print("*** HEAD ***")
display(health_df.head())
print("*** NULL CHECK ***")
display(health_df.isnull().any().sum())
print("*** df INFO ****")
display(health_df.info())
print("**** DESCRIBE ***")
display(health_df.describe())

Column Names:

Explanations of the columns had not been provided, assuming these columns’ descriptions.

expert_id – ID of the sales representative(categorical).
team_lead_id – ID of the team lead(categorical).
user_id – User id(categorical).
India vs NRI – Binary flag, whether Indian or not(Binary).
medicalconditionflag – Binary flag, any medical condition or not(Binary).
funnel – Source of the lead, BOT or FT(Free Trial).
event_type – The purpose of the call is either to book a consultation or a sales call.
current_status – Current status of the event.
handled_time – Conversation time between agent and user.
slot_start_time – Consultation start time.
booked_flag – Consultation booked or not.
payment_time – Subscription activation time.
target_class – Class of users.

Convert dates into datetime objects:

time_to_pandas_time = ["slot_start_time", "handled_time", "payment_time"]
for cols in time_to_pandas_time:
  health_df[cols] = pd.to_datetime(health_df[cols])
health_df.dtypes

What’s the 3-day and 7-day conversion of different lead types split by funnels?

health_df["max_start_handled_times"] = health_df[["handled_time", "slot_start_time"]].max(axis=1)
health_df["conversion_days"] = (health_df['payment_time'] - health_df['max_start_handled_times']).dt.days
health_df.head()
## Total base
conversion_base = health_df.groupby(["team_lead_id", "funnel"]).agg({"max_start_handled_times":"count"}).reset_index()
## in 3 days how many converted
conversion_base_3 = health_df[health_df["conversion_days"] <= 3].groupby(["team_lead_id", "funnel"]).agg({"max_start_handled_times":"count"}).reset_index()
## in 7 days how many converted
conversion_base_7 = health_df[health_df["conversion_days"] <= 7].groupby(["team_lead_id", "funnel"]).agg({"max_start_handled_times":"count"}).reset_index()
import pandas as pd
from functools import reduce
# compile the list of dataframes you want to merge
data_frames = [conversion_base, conversion_base_3, conversion_base_7]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=["team_lead_id", "funnel"],how='left'), data_frames)
df_merged["seven_day_perc"] = df_merged.max_start_handled_times/df_merged.max_start_handled_times_x
df_merged["three_day_perc"] = df_merged.max_start_handled_times_y/df_merged.max_start_handled_times_x
df_merged.head()

What hours work best for connectivity and sales?

health_df["max_start_handled_times_hour"] = health_df.max_start_handled_times.dt.hour
health_df_bt = health_df[["max_start_handled_times", "max_start_handled_times_hour","funnel", "team_lead_id" ,"India vs NRI", "medicalconditionflag","payment_time","current_status"]]
## Total base
conversion_base_H = health_df_bt.groupby(["max_start_handled_times_hour"])
                .agg({"max_start_handled_times":"count", "payment_time":"count"}).reset_index()
conversion_base_H["conversion_ratio"] = (conversion_base_H.payment_time/ conversion_base_H.max_start_handled_times)*100
conversion_base_H.sort_values(by = ["conversion_ratio"], ascending = False).head()

6 AM has the highest conversion ratio, 6.13 but only 16 people converted so the total revenue is lower compared to say 10 AM. So the best hours should be the ones where revenue soars and conversion is also. Let’s assume the unit cost of the subscription to be Rs.999.

conversion_base_H["unit_gross_amt_avg"] = 999  ## assume
conversion_base_H["gross_sales"] = conversion_base_H.payment_time*conversion_base_H.unit_gross_amt_avg
conversion_base_H.sort_values(by = ["gross_sales"], ascending = False).head()

When revenue is optimized, 11 AM is the best time for connectivity and sales. Generally, morning hours from 10 AM to 12 AM and evening from 5 PM to 6 PM are best, considering conversion and revenue.

Come up with insights on how best you would optimize slots, coach, and funnel for the most efficient outcome. Are there any other insights you can come up with using this data?

The introduction explains that most people use inductive learning through practice and observation. Readers can attempt questions 3 and 4 and try to solve them to improve their analytical thinking and problem-solving skills.

Some other important questions,

Build a classification model and provide a conversion propensity score for each user X Agent.
Should the company invest more in FT or BOT? which has higher conversion?
It is observed that conversions are higher for users with a medical history. How can this be exploited to accrue higher revenues and subscriptions?
Is it true that Bot has a better conversion for advanced countries, and for India, a Free trial is better?
What is the cost per lead(CPL, assume missing data point if needed. For example, the cost can be assumed based on a quick google search)?
Which sales lead has a higher retention rate? And why?
What is the cost of acquisition(CAC) for FT and BOT(assume relevant data points if needed)?
Seven-day conversion is not better; is this because of the sales agent( are these agents’ conversions worse off)? Or is this due to some customer cohort?

Solving the above questions without external help will provide immense self-confidence while solving actual interview case studies.

Our Recommendations to Data Analysts

Seven-day conversion is just slightly better than three days conversions (from 31% to 33%, from 35% to 39%) so provide attractive subscription offers once 3 day time period is over. This could increase the 7-day conversion by X%(the X% can be calculated by previously run promotional campaigns).
Run a KYC(know your customer survey) to understand why aren’t people converting post 3 days.
Identity agents whose 3-day conversion is low and retrain them.
Morning hours from 10 AM to 12 AM and evening from 5 PM to 6 PM are best, considering conversion and revenue.
Readers can add more recommendations.

Conclusion

Case studies, when done right, following the steps given above, will have a positive impact on the recruiter. Recruiters aren’t looking for answers but an approach to those answers, the structure followed, the reasoning used, and business and practical knowledge. This article provides an easy-to-follow framework for data analysts using a real business case study as an example.

To summarise:

Use case studies as a practical resume to showcase strong fundamentals, tools, and techniques.
Go above and beyond, and answer a few ambiguous questions.
Provide all possible solutions (at least two), and don’t limit solutions to just one. Discuss the advantages and drawbacks of each solution.
Start with descriptive analysis, then move on to prescriptive analysis and finally provide practical recommendations that can be implemented by business teams, in this case, the sales team.
Highlight the outcome of the solution, for example – It will save 7000 man hours, increase revenue by 5%, etc.
Finally, present everything in a PPT – 1. Agenda, 2. The problem statement, 3. Supporting slide with charts and insights, 4. The solution, recommendation, and Conclusion.
Check out my article on – Solving Business Case Study Assignments For Data Scientists.
Be audacious!

Good luck! Here’s my Linkedin profile if you want to connect with me or want to help improve the article. Feel free to ping me on Topmate/Mentro; you can drop me a message with your query. I’ll be happy to be connected. Check out my other articles on data science and analytics here.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Chrisx10

Data scientist. Extensively using data mining, data processing algorithms, visualization, statistics, and predictive modeling to solve challenging business problems and generate insights. My responsibilities as a Data Scientist include but are not limited to developing analytical models, data cleaning, explorations, feature engineering, feature selection, modeling, building prototype, documentation of an algorithm, and insights for projects such as pricing analytics for a craft retailer, promotion analytics for a fortune 500 wholesale club, inventory management/demand forecasting for a jewelry retailer and collaborating with on-site teams to deliver highly accurate results on time.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Business Case Study Assignments For Entry Level Data Analysts

Introduction

Table Of Contents

Advantages of Using Case Studies for Data Analysts

Guidelines to Solve Case Studies for Data Analysts

Sales Leads and Conversions

Problem Statement (PS) – Improving the Efficiency of Sales Conversion

Our Recommendations to Data Analysts

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect