Festive season special: Building models on seasonal data

Tavish Srivastava Last Updated : 24 Apr, 2015

5 min read

Has your model ever failed on out of time validation because of seasonality?

If yes, then you need to know that one of the reason this happens is seasonality in performance or seasonality in cohort. This article will tell how to identify them and then take you through the industry standard techniques to take into account seasonality while building model.

[stextbox id=”section”]Why does seasonality exist?[/stextbox]

Most industries have seasonal business trend. If there is seasonality in business, it implies that propensity of customer to buy a product is biased towards certain period in the year. This bias can originate because of numerous reasons. One of the most common reason for this bias is common market environment for all customers. For example, Indian financial year ends on 31st March. Because there are tax rebates on insurance products, people tend to buy more insurance product just before the end of financial year to claim these rebates. Hence, March is a high time for insurance industry in India. An analysis of past 10 years insurance business data show that 25-30% of business of insurance industry in India come in the month of March. Similarly, there is a surge in sales of consumer goods in UK and US leading up to Christmas.

Figure below shows a plot of monthly sales trend of a seasonal industry. It can noted that sales peaks and trough happen in same month year on year. In other words, the trend of sales remain same year on year.

[stextbox id=”section”]What is the impact if a model is built without considering seasonality in business?[/stextbox]

Seasonality has negative impact over both predictive and descriptive model, if not treated explicitly. Lets take the case of a descriptive model and see what is the impact of seasonality over the model. Following is simple decision tree, where we have segmented the customer portfolio into 3 segment based on their attrition rate in next 1 month. Lets say that the month in which we are observing attrition is January (Portfolio attrition rate 30%).

Now say, we implement the model to predict the probability of attrition for the month of July. Following are the possible errors induced by seasonality :

1. Rank ordering among segments might change from 1-2-3.

2. Overall attrition rate might change from 30% and thereby changing individual attrition rate.

Both the errors result in loss of effectiveness of the model on implementing it on different months.

Predictive models are even more seriously impacted by seasonality, because the second error leads to a big deviation in the predictive power of the model. To make and accurate predictive or descriptive model on seasonal data, we need at least 12 months of data for training the model.

[stextbox id=”section”]Type of Seasonality[/stextbox]

There are two types of seasonality which need to be addressed in any model :

1. Seasonality in Performance : This is simpler seasonality to address in any model. The example used in the beginning of the article (Insurance industry business) is a good example to illustrate this type of seasonality. Say, we want to predict the performance (business sourced) of a sales agent in next 3 months. In this case, in which business is seasonal, performance seasonality needs to be addressed to make a stable predictive model.

2. Seasonality in Cohort : This is tougher seasonality to be addressed in a model. Seasonality in cohort is driven by the difference in characteristics of the base population in different months.Whenever cohort is seasonal, performance is seasonal as well. Say, we want to predict the performance (business sourced) by a sales agent in his first 12 months. Now, we know that March business sourcing is much easier than any other month. Hence, a sales agent on-boarding in Jan, Feb and Mar will have a higher average first 3 month performance compared to any other month. A good start is highly correlated to an overall higher 12 month performance. Hence, we have a seasonality in the cohort and agents on-boarding in Jan, Feb and Mar should be treated differently.

[stextbox id=”section”]Industry Standard techniques to address seasonality[/stextbox]

There are three methods followed industry wide to address the two types of seasonality mentioned in last section:

1. Long interval target function : Seasonality in performance can be addressed by taking 12 month long performance window. But this method fails if the seasonality exists in the cohort.

2. Use of same training and scoring target month : This technique addresses both the seasonality issues. Say, we want to predict March attrition. We will train the model on last year march and then use the same model to predict March attrition this year. This technique is robust but fails if there was any characteristic difference between the training and scoring month. Say, the company changed the definition of attrition after March last year. In this case the technique will not take into account the recent trends of attrition and give false prediction for this March.

3. Mix of cohort : This technique is used mainly in risk modelling. It addresses both the seasonality issues. We take a mix of samples from all different types of cohort and use it as the training population. This is the most robust technique to address seasonality both in performance and cohort. This method does take into account recent trends as well while making the prediction and hence better than last technique in cases where there is some difference in characteristic between the target and the training month. But in cases where the target and training month is exactly same, last technique will give better prediction as mixing cohort will offset the target variable.

[stextbox id=”section”]Final Notes[/stextbox]

Out-of-time validation helps us identify if the model’s performance is being altered by seasonality. Techniques like bootstrapping and Jack-knife can only check the stability of the model and is incapable to check the effect of seasonality over the model.

Do you think this provides solution to any problem you face? How do you address the problem of seasonality in your modelling ? Are there any other techniques you use to improve performance of your models (prediction or stability)? Do let us know your thoughts in comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emails or like our facebook page.

Tavish Srivastava

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Free Courses

4.6

Building and Evaluating RAG System

Learn to build RAG system applications, create AI agents, and deploy.

4.8

Build Products 10x Faster with GenAI : Hands On

Master prompt engineering,build AI apps with LangChain & deploy custom GPTs.

4.6

Evaluation Metrics for Machine Learning Models

This course covers evaluation metrics to improve ML model performance.

4.9

Introduction to Data Visualization

Learn the essentials of data visualization with real-world examples

4.6

Big Mart Sales Prediction Using R

Use R to solve Big Mart Sales Prediction with regression techniques.

sandhya

Hi Tanish, Can we use the above models for time series techniques?

Kumar

Sales forecasts (time series) uses decomposition - followed by smoothing - and recomposing techniques to address seasonality & trend pattern in data

Franklin

Hi Tanish, NIce explanation. I am still confused about some terms. Observation Period: Performance window: Development period: I´ll appreciate you can clarify them. Thanks.

Show 1 reply

Observation period is used to populate independent variables in training window. Performance window is used to populate the dependent variable in training window. Development and scoring is the performance window of training and scoring respectively. Hope this helps.

Reading list

Festive season special: Building models on seasonal data

Has your model ever failed on out of time validation because of seasonality?

If you like what you just read & want to continue your analytics learning, subscribe to our emails or like our facebook page.

Login to continue reading and enjoy expert-curated content.