This article was published as a part of the Data Science Blogathon.
“Don’t over-promise, don’t underpromise, deliver what you promise.”
Customer satisfaction is one of the key areas to focus on for establishing any successful business. A satisfied customer is the biggest asset for any organization. Word of mouth lends credibility to the brand and helps to attract new customers. This saves money spent on publicity and campaign to acquire new customers. We will focus on the e-commerce industry and areas to enhance customer experience and satisfaction. For example
When a user places an order on any e-commerce site, a few calculations are done at the backend, and the final delivery date is presented on the screen. It is generally referred to as “Get by date, “Delivery date,” “customer promise date,” or “In-Hand date.” A customer promise date is a date by which an e-commerce promise to deliver a product to the customer.
Meeting customer promise dates is of most importance for any company. Customer promise date is calculated by taking various factors like sourcing location, processing and packing time, transit time, holidays, cost, customer zip area, etc.
Product delivery also gets affected by the below points.
Every firm works on the ‘deliver what you promise’ principle. However, an order can be delivered on-Time or Early or it can be delayed. Early delivery (ED) of orders might be delighter for the customer, but it is not an ideal situation for any firm. ED also means there is an opportunity to optimize the Location-Carrier TNT.
Missing customer promise date is the worst situation for any firm. It loses brand value, customers, and revenue. Late delivery of products also means that firms are overpromising. Hence there is an opportunity to adjust TNT. Ideally, the orders should be delivered On-Time. To meet on-time delivery firm takes various steps like upgrading shipping methods and prioritizing delivery, which adds up to the operational cost.
Once the order is packed and moved to “ready to be picked by carrier “ status, the Carrier picks up the order for delivery. Transit-in time is the time the carrier takes from order pickup to order delivery to the customer. Presently, TNTs are provided by carriers for each zip area combination.
We will try to generate data-driven TNTs and use them further for calculating the Customer promise date. Data-driven TNTs would be more realistic than fixed TNTs for zip area combinations. TNTs may differ by carrier for the same combination of zip areas. Therefore, TNTs for a carrier depend upon their network in an area, their service in the area, if the area is serviceable and accessible, etc. For all same-day delivery, TNT is 1 day. For Non -the same day delivery, it may be 2,3,4,5,6, etc
Sourcing is an algorithm that identifies DCs/Store locations where to source orders from. The store location is selected based on the inventory, cost, and distance from the customer’s zip area. Once the source location is finalized, sourcing will release order information to the identified location.
Once an algorithm identifies a sourcing location, the orders are released to stores. Orders are picked up by the store, they have them checked on inventory again, validated with the customer orders, and packed before carriers pick it up for delivery.
Processing and packing time depend on the type of orders. Orders can be same-day delivery orders or non-same-day orders. Time varies between 15 minutes to 2 hours.
Time by when an order should be pick-ready. The hypothesis is that we tend to miss a lot of customer promise dates when we have misalignment on cut-off time.
Cut off time= (Customer promise date – TNT) + processing time
For example, if an order has a customer promise date as 2019-05-28, TNT of 2, the processing time is 15:30, then cut off time will be equal to 2019-05-26 at 15:30:00
Third-party or In-house delivery partners are responsible for picking up orders from DC/Stores and delivering orders to customers based on customer preferred time window. Currently, the firms use TNT provided by the carrier to calculate the customer promise date. There can be multiple carriers operating in-store locations and the customer zip area. Hence identifying the best performing carrier for a given combination will help reduce missing customer promise dates. Carrier performance also depends on the volume of orders it can manage.
The calculation for the customer promise date :
CPD= Processing & packaging time + cutoff time +TNT(provided by carrier)
New calculation for Customer promise date :
CPD= Processing & packaging time + cutoff time +TNT(data-driven)
Pictorial representation for Store to the home order life cycle
Reading and Cleaning the data
Python Code:
import pandas as pd
import numpy as np
df = pd.read_csv('order_details.csv')
df.drop(df[df['MCD']=='FAILED_DELIVERY_ATTEMPT'].index, axis=0, inplace = True)
df.DELIVER_TIME = pd.to_datetime(df.DELIVER_TIME)
df.SHIPTIME = pd.to_datetime(df.SHIPTIME)
print(df.isnull().sum())
print(df.head())
This data is at the order line level. In order, there can be one or more different SKUs. Each SKU is given a unique no which is referred to as order_line_key.
So, if we look at 1st row in the data, this order is single line order. It was sourced from store ABC_82439009370862 to customer zip 32163. It was picked up by carrier DL, with a given tracking number and Customer promise date as 2021-02-13, delivery date as 2021-02-12. It was delivered earlier than CPD. TNT for source location and customer zip combination was 2.
On careful inspection, we can observe it was shipped and delivered on the same day. Hence actual TNT is one day.
Data dictionary
Dropping NA’s from the data set
df_test= df_test.dropna(subset =['order_line_key', 'order_no', 'zip_code', 'locId', 'CAR_NAME', 'TNT','TRACKNO', 'CPD', 'SHIPTIME', 'MCD','DELIVER_TIME'])
Renaming columns
df_test.rename(columns={'SHIPTIME': 'Ship_Time', 'DELIVER_TIME': 'Delivered_Time'}, inplace=True)
Calculate Actual TNT, Time in days from Ship_Time to delivered time, excluding the non-busy day
df_test=df_test.assign(Actual_TNT= np.busday_count( df_test['Ship_Time'].dt.date , df_test['Delivered_Time'].dt.date) + 1 )
if ‘Ship_Time’ is 6th Aug, and Delivered_Time is 7th. It took 2 days to deliver(6th and 7th). If we count days between 6th and 7th it would result in 1 day. Hence adding +1 to count to align it to 2 days. It would also ignore non-business days in the count.
Actual_TNT for each order_line_key
Calculate the day of the week for filtering out holidays/non-pickup days/non-delivery days for each carrier
df_test=df_test.assign(Actual_TNT_dayweek= (df_test['Delivered_Time'].dt.dayofweek)) df_test=df_test.assign(Actual_ship_day= (df_test['Ship_Time'].dt.dayofweek)) df_test['CPD']=pd.to_datetime(df_test['CPD']) df_test['Delivered_Time']=pd.to_datetime(df_test['Delivered_Time'])
OTD: On-time delivery, where CPD is equal to Delivered time. We always want to maximize OTD
df_test['OTD']= df_test['CPD'].dt.date == df_test['Delivered_Time'].dt.date
ED: Early delivery, where CPD is less than to Delivered time
df_test['ED']=df_test['CPD'].dt.date > df_test['Delivered_Time'].dt.date
MIHD: Missing customer promise date, where CPD is greater than to Delivered time
df_test['MIHD']= df_test['CPD'].dt.date < df_test['Delivered_Time'].dt.date
Extracting the first 3 digits from the zip code
df_test['zip_3digit']=[x[:3] for x in df_test['zip_code']]
All carrier, in this case, has the same non-pickup and non-delivery days. Applying a filter to remove any non-pickup and non-delivery days for the carrier
df_test= df_test[df_test['Actual_TNT_dayweek'].isin([0,1,2,3,4])] df_test= df_test[df_test['Actual_ship_day'].isin([0,1,2,3,4])]
Calculate the Median of Actual TNTs for TNT-source location- customer location area combination. A MIHD rate and ED rate for comparison and decision making.
df=df_test.groupby(['TNT','locId','zip_3digit']).agg(Mediantnt=('Actual_TNT','median'),vol=('TRACKNO','count'),MIHD=('MIHD','sum'),ED=('ED','sum'),OTD=('OTD','sum')).assign(rateMIHD= lambda x : x.MIHD / x.vol,rateED= lambda x : x.ED / x.vol).sort_values(['vol'],ascending=False).reset_index() df['Mediantnt']=round(df['Mediantnt']) df.head(19)
Final output
Output interpretation:
Consider 0th row; 39 orders were shipped from store ABC_82335000473013 to customer location zip area 945. All 39 orders were associated with TNT = 5. The median of Actual TNT is 3, which is less than TNT shared by the carrier. Here is an opportunity to reduce TNT from 5 to 3 for ABC_82335000473013 and zip area 945. The reduction of TNT is also supported by the fact that all 39 orders were delivered early.
‘Mediantnt’ is more realistic and data-driven. We use data-driven TNT instead of fixed TNT provided by the carrier. A similar approach can be applied to identify the best-performing carrier for a given TNT, store location, or zip area combination.
Reduce TNT scenario:
In all cases where TNT is greater than ‘Mediantnt,’ and the rate of early delivery is above .85%, TNT can be set to ‘Mediantnt’.
Increase TNT scenario
In all cases where TNT is less than ‘Mediantnt,’ and rateMIHD is above .85%
TNT can be set to ‘Mediantnt.’
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Very well written, to the point (crisp), and informative too. Well done 👍 keep it up! God bless!!
This Blog is very helpful!! Thanks Azim