This article was published as a part of the Data Science Blogathon.
Customers are the most important driver of growth and the foundation of every business. More customers translate into higher revenues and better profitability in the long run. But in recent times, and due to heavy competition customer loyalty to a brand has declined and keeps on deteriorating. How can businesses decide on customer loyalty? CLV is one measure to do so. It shows the value added by customers and helps businesses make important spending decisions. Growing CLV can directly grow customer loyalty.
Customer lifetime value means different things to different businesses, and different teams use CLV in various contexts. If you want to understand more about CLV, its use cases, and its implementation in different scenarios, then you are at the right spot.
Questions that CLV can Answer
The below waterfall chart shows that unless a user transacts more than four times, he is not profitable. Only after the fourth transaction, the business can profit from the user. CLV aims to optimise this customer journey and provide long term value to the business. Another observation is CAC(customer acquisition cost) is very high compared to the sum of all variable costs, in simple terms, customer retention is a lot cheaper than customer acquisition. COGS(cost of goods sold ).
Non – Contractual (Always a share):
Contractual (Lost for good):
CLV in a freemium setting:
Youtube works in a freemium setting, users can freely avail of its services but for additional services, there is a subscription cost. Only a small subset of users pay for subscriptions and based on their payments, the rest of the users get free services. So CLV helps identify users with high values, and those who have a higher probability of conversion. One month’s trial service is another lever used by freemium providers to gauge customer value. In Youtube’s case, 50M subscribers help fund 2.9 Billion users as of September 2021.
Modelling Approaches :
Metrics help companies to gauge customer health, and provide day to day actionable insights. A few important metrics which help make better sense of CLV are noted below, the list is not exhaustive.
In Amazon’s context, of all people who visit the website, how many places an order. In Zerodha’s context, it can be of all people who visit the app, how many places are in order either on BSE/NSE or MCX.
Predictable revenue generated by the business across all active customers per month.
Average revenue regranted by customers over their entire relationship with the company.
The average revenue per user measures the amount of money that a company can expect to generate from an individual customer.
The rate at which customers stop doing business with a company over a given period.
Rentention rate = 1 – Churn rate.
In a contractual setting, measuring churn is easier and its interpretation also is straightforward. In a non-contractual setting, retention rate can be used as the definition of churn can vary from firm to firm, but retention rate is more of a standard measure.
The average spending made by a company to acquire a customer.
Datasets to explore CLV:
Brazilian E-Commerce Public Dataset by Olist has been used in this guide to predict CLV.
%matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import lifetimes #Let's make this notebook reproducible np.random.seed(42) import random random.seed(42) import warnings warnings.filterwarnings('ignore')
Read Data
df1 = pd.read_csv('../input/olist_orders_dataset.csv') df2 = pd.read_csv('../input/olist_customers_dataset.csv') df3 = pd.read_csv('../input/olist_order_payments_dataset.csv') cols = ['customer_id', 'order_id', 'order_purchase_timestamp'] orders = df1[cols] orders = orders.set_index('customer_id') orders.drop_duplicates(inplace=True) # too few cols = ['order_id', 'payment_value'] payment = df3[cols] payment = payment.set_index('order_id') payment.drop_duplicates(inplace=True) cols = ['customer_id', 'customer_unique_id'] customers = df2[cols] customers = customers.set_index('customer_id') elog = pd.concat([orders,customers], axis=1, join='inner') elog.reset_index(inplace=True) cols = ['customer_unique_id', 'order_purchase_timestamp'] elog = elog[cols] elog['order_purchase_timestamp'] = pd.to_datetime(elog['order_purchase_timestamp']) elog['order_date'] = elog.order_purchase_timestamp.dt.date elog['order_date'] = pd.to_datetime(elog['order_date']) cols = ['customer_unique_id', 'order_date'] elog = elog[cols] elog.columns = ['CUSTOMER_ID', 'ORDER_DATE'] elog.info() display(elog.sample(5))
calibration_period_ends = '2018-06-30' from lifetimes.utils import calibration_and_holdout_data summary_cal_holdout = calibration_and_holdout_data(elog, customer_id_col = 'CUSTOMER_ID', datetime_col = 'ORDER_DATE', freq = 'D', #days calibration_period_end=calibration_period_ends, observation_period_end='2018-09-28' )
from lifetimes import ModifiedBetaGeoFitter mbgnbd = ModifiedBetaGeoFitter(penalizer_coef=0.01) mbgnbd.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal'], verbose=True) print(mbgnbd)
t = 90 # days to predict in the future summary_cal_holdout['predicted_purchases'] = mbgnbd.conditional_expected_number_of_purchases_up_to_time(t, summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal']) summary_cal_holdout['p_alive'] = mbgnbd.conditional_probability_alive(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal']) summary_cal_holdout['p_alive'] = np.round(summary_cal_holdout['p_alive'] / summary_cal_holdout['p_alive'].max(), 2) display(summary_cal_holdout.sample(2).T)
from lifetimes.plotting import plot_period_transactions ax = plot_period_transactions(mbgnbd, max_frequency=7) ax.set_yscale('log') sns.despine();
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases plot_calibration_purchases_vs_holdout_purchases(mbgnbd, summary_cal_holdout) sns.despine();
from lifetimes.plotting import plot_incremental_transactions plot_incremental_transactions(mbgnbd, elog, 'date', 'CUSTOMER_ID', t, t_cal, freq='D') sns.despine()
Cohorts can be customers who exhibit similar traits over a period of time or customers acquired in a particular period. There isn’t a clear industry-wide definition, Segments and cohorts are sometimes used interchangeably. Cohorts have a time variable associated with it and segments do not, and this is how it’s widely used.
CLV across cohorts can vary drastically, for December cohort CLV can be higher compared to the March cohort. Or December to May cohort can have higher CLV than June to November. Identifying valuable cohorts and ensuring higher retention across those cohorts becomes essential in the long run.
All models be they well researched and developed or vanilla models, all serve some business purpose that drives revenue. Online stores use these models to recommend products on the homepage. Some might say that’s what a recommendation system is for and they would be right, but the crux of any business is to drive revenue and that’s where CLTV comes into play. Example: Consider two users, users A and B, A has high CLV and B has lower CLV, and both of them are browsing for baby products in the kids’ category. A and B should ideally be recommended the same content, but due to their difference in CLV, A will be shown contents that drive long term value, for example instead of a stroller, which is a one time buy, A could be shown baby cereal(Rs300 per pack) and diaper(Rs1000 per pack). On the other hand, B will be shown a stroller(which costs about 3K to 5K) as it’s known that B has a low CLV and the chances of him returning to the platform are lower. Here’s the unit economics – User A for 2 years would have spent at least 20K whereas User B would have made a one-time purchase of 5K. An ideal recommendation model would use CLV intelligence to get the best value out of a user.
Facebook, for example, wouldn’t show content that’s best for the user, if that was the case a 15-year-old should always receive Khan Academy videos and academic-related content, it shows content that makes the user stick to the platform, which intern helps bring higher revenue, which profits facebook. This is true across all platforms and apps. Be it shopping, social media, payments, financial services, food delivery etc.
CLV isn’t just a model, nor it’s just a number, it’s the valuation of a business, in real terms decreasing CLV trends for more than 4 quarters for some could be the beginning of a job search, and for others opportunity to identify pain points and solve them, figure out areas of growth and work towards it.
The actionability of CLV models can be broadly divided into customer loyalty and customer retention. A few other actionable items are listed below:
Good luck! Here’s my Linkedin profile if you want to connect with me or want to help improve the article. Feel free to ping me on Topmate/Mentro; you can drop me a message with your query. I’ll be happy to be connected. Check out my other articles on data science and analytics here.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.