In business, there are some golden rules that everyone needs to appreciate and adopt.
Eighty percent of our business comes from 20% of our customers.It costs 10 times less to sell to an existing customer than to find a new customer
The above golden rules define why we need Customer Lifetime value.
In our stock index, NIFTY 50 defines how our stock market is performing similarly, in business, it’s important to understand who your top customers are that give consistent and increasing revenue streams for your business.
One of the simple and effective methodologies which are generally used in calculating customer value over a time frame is RFM which is,
Recency (R): How recently a customer has made a purchase
Frequency (F): How often a customer makes a purchase
Monetary Value (M): Dollar value of the purchases
We will go a little deeper into RFM analysis in the following section:
Let’s take a small example where a bank wants to identify key customers for retention/development/acquisition.
So in the above scenario, we need to score each customer who had recent transactions with the bank on three important metrics mentioned above R, F & M.Then create a scoring methodology to segment the customer base and apply for different marketing programs.
Let’s calculate the RFM score for 5 sample customers,
Preferably RFM is done for recent data and will be refreshed on a quarterly/half-yearly basis based on the business
Finding R, F and M are pretty simple. Let’s say a customer deposited 10 K money on May 1st and deposited another 5 k on June 10th and if you are doing RFM analysis on July 1st. Now for this customer, the Recency will be 1 month because the last transaction was in June and Frequency will be 2 because he made two deposits in May and June and M will be 15 K
For Recency, smaller the better, because of the customer, we are on top of his mind and for Frequency & Monitory larger values are better
Let’s take the above table as an example, when compared to all customers, recency is best for customer 3 as he is ranked as number 1, whereas for frequency he is in 4th position, and in terms of the value he is in 2nd position.
Standardize = current value/Max(Value) * 100
Consolidated Score = 0.15*R + 0.28*F + 0.57*M
We just applied those weights to each customer.
For example,
Customer 4 value = 0.15 *40 + 0.28 *60 + 0.57*60 = 57
Then we segregated score as three segments,
Now based on the above scores, a business can apply the differentiation strategy like retention/development/acquisition of different customer segments
Further, most we can profile these segments with additional features like demographics, spending pattern, and several products e.t.c; understand them a little deeper.
Now let’s try to implement this RFM analysis in python.
This is a Case study, where we are using a European retail chain data set.
Sample data is as follows:
For our RFM analysis, the key important features we will be using are InvoicDate, CustomerID and for sales, we are using Quantity and Unit Price
# Import Packages
import numpy as np import pandas as pd import time, warnings import datetime as dt warnings.filterwarnings("ignore") # Get the Data # Read the data df=pd.read_csv("retail_data.csv") df.head() # RFM Analysis **RFM** (Recency, Frequency, Monetary) analysis first we need to create three features R , F & M from the data lets create those features ## Recency # To calculate recency, we need to find out **when was the customer's most recent purchase.**. # Create a new column called date which contains the date of invoice only df['date'] = pd.DatetimeIndex(df['InvoiceDate']).date # Group by customers and check last date of purchase recency_df = df.groupby(by='CustomerID', as_index=False)['date'].max() recency_df.columns = ['CustomerID','LastPurshaceDate'] # Calculate recent date to find recency wrt to this date recent_date=recency_df.LastPurshaceDate.max() print(recent_date) # Calculate recency recency_df['Recency'] = recency_df['LastPurshaceDate'].apply(lambda x: (recent_date - x).days) recency_df.head()
Now in the same way we will calculate both Frequency and Monetary values
# ## Frequency # To calculate Frequency we need to check **How often a customer makes a purchase**. # Drop duplicates df1= df df1.drop_duplicates(subset=['InvoiceNo', 'CustomerID'], keep="first", inplace=True) # Calculate the frequency of purchases frequency_df = df1.groupby(by=['CustomerID'], as_index=False)['InvoiceNo'].count() frequency_df.columns = ['CustomerID','Frequency'] frequency_df.head() # ## Monetary # To calculate Monetary value **How much money did the customer spent during the timeframe?** # Create column total cost df['TotalCost'] = df['Quantity'] * df['UnitPrice'] monetary_df = df.groupby(by='CustomerID',as_index=False).agg({'TotalCost': 'sum'}) monetary_df.columns = ['CustomerID','Monetary'] monetary_df.head() # ## Create RFM Table # Merge recency dataframe with frequency dataframe temp_df = recency_df.merge(frequency_df,on='CustomerID') temp_df.head() # Merge with monetary dataframe to get a table with the 3 columns rfm_df = temp_df.merge(monetary_df,on='CustomerID') # Use CustomerID as index rfm_df.set_index('CustomerID',inplace=True) # Check the head rfm_df.head()
# Rank each metric R , F & M rfm_df['R_rank'] = rfm_df['Recency'].rank( ascending=False) rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True) rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True) rfm_df.head() # normalize each rank with Max rank rfm_df['R_rank_norm']=(rfm_df['R_rank']/rfm_df['R_rank'].max())*100 rfm_df['F_rank_norm']=(rfm_df['F_rank']/rfm_df['F_rank'].max())*100 rfm_df['M_rank_norm']=(rfm_df['F_rank']/rfm_df['M_rank'].max())*100 rfm_df.head() # Now apply our equation and create final score **Consolidated Score = 0.15*R_rank_norm + 0.28*F_rank_norm + 0.57M_rank_norm** rfm_df['RFM_Score']=0.15*rfm_df['R_rank_norm']+0.28*rfm_df['F_rank_norm']+0.57*rfm_df['M_rank_norm'] rfm_df=rfm_df.round(0) rfm_df.head()
# ## Customer segments with RFM Model # # Segment customers based on RFM score # 0 - 50 - Low valued customer # 50 - 75 - Medium valued customer # 76 - 100 - High valued customer rfm_df["Customer_segment"]=np.where(rfm_df['RFM_Score'] > 75 ,"High Value Customer",(np.where(rfm_df['RFM_Score'] < 50 , "Low value Customer" ,"Medium Value Customer"))) rfm_df.head()
High Value Customer: They are your loyal customers.So provide constant support via customer serviceMedium Value Customer: Send them personalized emails with offers and encourage them to shop moreLow Value Customer: These customers are about to churn or went into a dormant stage,apply reactivation strategies for them
In python and R we have multiple packages which support RFM analysis and there are multiple ways to derive an RFM score also.
Hope you like this article. Happy learning 🙂
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Hi Manoj, Could you please help me the dataset. Vijay.
is it valid if max(ran_value) is different for R,F,M ?