Various Techniques to Detect and Isolate Time Series Components Using Python

Shailesh Last Updated : 18 May, 2024

10 min read

Introduction

Whenever we talk about building better forecasting models, the first and foremost step starts with detecting. Decomposing time series components like a trend, seasonality & cyclical component and getting rid of their impacts become explicitly important to ensure adequate data quality of the time-series data we are working on and feeding into the model as getting sturdy time series data (stationary data) having no significance of trend, and seasonal component is a rare phenomenon. While we are blessed with so many techniques, understanding their advantages and disadvantages and the right selection plays a vital role in meeting the objective. In this article, we shall be learning essential steps of selecting the best decomposition techniques through the practical application of each one using python.

Keeping the above objective in mind, I have structured the learning by giving in-depth details on the techniques for detecting and de-attaching the various time series components.

This article was published as a part of the Data Science Blogathon

Introduction
Components of Time Series Forecasting
2. Detecting Trends and Detrending the Data
3 Detect Seasonality and De-seasoning
4. Detecting Cyclical Variation
5. Error, Irregular Component, and Residuals
6. Time Series Decomposition
Difference between Seasonality and Cyclicity
Conclusion

Components of Time Series Forecasting

Time-series data has four major components, as shown in the below figure. Before we proceed further, getting acquainted with these components becomes essential, along with knowing the significant levels of differences within themselves. These cited components are a trend, seasonality, cyclical and irregular components.

The level can be understood as the average value of the data point in time-series data.
The trend means an increasing or decreasing value in time-series data.
Seasonality means repeating the pattern of a cycle in the time-series data.
Noise means random variance in time-series data.

Graphically, all these aforesaid components can be distinguished as per the below figure:

Graphical Presentation of Time-Series Component

2. Detecting Trends and Detrending the Data

2.1 Detecting Trends

Traditional forecasting techniques (Moving Average & Exponential Smoothing) work well for fairly sturdy data having no significance of trend and seasonality. But before applying any forecasting modeling, the best practice is to mandatorily check the presence of trend and seasonality as many time-series datasets have effects of both of these components; hence it becomes essential to find and remove these aforesaid components to get a better forecast. The below figure shows a flow chart that can be referred to as a general procedure for handling series data

Let’s experiment with our learning on the real-world industry-related dataset.

Case-1 is about the Steel Wastage Salse Dataset over a period of 4 years (2018-2022), where a Project Infrastructure based company has recorded the steel waste sales data and wanted to forecast the selling rate for reconciliation of project cost. Using python libraries, let’s try to visualize the data.

The easiest way to begin detecting trends is just by plotting a line plot using the Pandas seaborn library and visualizing the long-term upward/downward movement, if any.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_excel(r"C:Users..Time Series ForecastingScrapRate04.xlsx")
plt.figure(figsize=(15,5))
sns.lineplot(x='Date', y='Rate in Rs./Kg.', data=df, legend=True, color='r', label='Actual Trend')
plt.ylabel('Scrap Rate', fontsize=15)
plt.xlabel('Date of Sale', fontsize=15)
plt.title('Steel Scrap Rate:2018-2022', fontsize=20);

The plot shows Steel Waste Sales Data from 2018 to 2022

Prima fascia, our data looks to follow an upward trend from 2021 onward but to prove our expression scientifically, we rely on some robust methods as mentioned below.

2.1.1 Detecting Trend Using a Hodrick-Prescott Filter

HP Filter is the most used technique for detecting trends from time series datasets. Mathematically it can be expressed (fig-5) by adding two terms (1) Sum of Squared Variations by penalizing cyclical component (Yt – Tt) and (2) the second term defines multiple λ of the sum of the square of trend components (with second differencing) which penalize variations in the development of trend components.

In the above figure, L denotes the lag operator, which operates on the previous values in time series data. In practice, the common value of λ can be referred to as 100 for yearly data, 1600 for quarterly data, and 14,400 for monthly. The larger the value of λ, the larger the penalty to the variation in the growth rate.

from statsmodels.tsa.filters.hp_filter import hpfilter
sw_cycle,sw_trend = hpfilter(df['Rate in Rs./Kg.'], lamb=100)
sw_trend.plot(figsize=(10,5)).autoscale(axis='x',tight=True) 
plt.title('Detecting Trend using HP Filter', fontsize=20)
plt.xlabel('Days', fontsize=15)
plt.ylabel('Steel Waste Sales Rate', fontsize=15)
plt.show()

Wow! Looking at the above figure, an upward trend is clearly visible, which significantly proves our assumption in the above figures.

2.2 Detrending Time Series

Detrending is the process of removing trends from the time series data. Identification, modeling, and sometimes removing trends from the time-series data can be beneficial and makes noticeable impacts. The below flow chart shows the significance of detecting the trend before attempting any statistical modeling techniques.

2.2.1 Pandas Differencing (First Order)

Differencing the original time series is a usual approach for converting a non-stationary process to stationary. It’s straightforward to define it as the difference between the previous day’s and today’s data. The first difference between consecutive Yt can be computed by subtracting the previous day’s data from the day’s.

Mathematically it can be expressed as;

_Y_t

Pandas function diff() is used both for series and DataFrame by which we can directly get the differencing. It can provide a period value to shift to form a differencing. Let’s plot the difference (difference between the day and the previous day) using a line plot with the following line of codes.

df['diff'] = df['Rate in Rs./Kg.'].diff()
plt.figure(figsize=(15,6))
plt.plot(df['diff'],color='g')
plt.title('Detrending using Differencing', fontsize=20)
plt.xlabel('Days', fontsize=15)
plt.ylabel('Steel Waste Rate', fontsize=15)
plt.legend()
plt.show()

Observed Data after removal of trend using Pandas Differencing

Using the differencing method, we can see that the trend has been removed, and now the plots have no apparent upward or downward movement. However, we followed the first order of differencing to eliminate the trend and got the result. Still, following the second or third order of differencing may be required to meet the objective if the first order differencing fails.

2.2.2 SciPy Signals

A signal is another form of time series data that increases or decreases in a different order. Using the SciPy library helps us to remove the linear trend from the signal data. By importing a python library called ‘signal,’ we can plot the trend using the below line of code.

from scipy import signal
import warnings
warnings.filterwarnings("ignore")
detrended = signal.detrend(df.Production.values)
plt.figure(figsize=(15,6))
plt.plot(detrended)
plt.xlabel('Days', fontsize =15)
plt.ylabel('Production', fontsize= 15)
plt.title('Detrending using Scipy Signal', fontsize=20)
plt.show()

2.2.3 HP Filters (Hodrick-Prescott)

Along with detecting the trend (already explained in section ref. 2.1.1), this technique has become the benchmark for getting rid of trend movement. It is broadly employed in econometric methods in applied macroeconomic research (i.e., international economic agencies, government macroeconomic research, etc.). This non-parametric technique is significantly used for tuning parameters to control the degree of smoothing. It is used to remove short terms fluctuations.

Being the yearly dataset given here to work with, we shall be using λ value at 100 with the below lines of codes.

from statsmodels.tsa.filters.hp_filter import hpfilter
import warnings
warnings.filterwarnings("ignore")
sw_cycle,sw_trend = hpfilter(df['Rate in Rs./Kg.'],lamb=100)
df['hptrend'] = sw_trend
df['hpdetrended'] = df['Rate in Rs./Kg.'] - df['hptrend']
plt.figure(figsize=(15,6))
plt.plot(df['hpdetrended'], color='darkorange')
plt.title('Detrending using HP Filter', fontsize=20)
plt.xlabel('Days', fontsize=15)
plt.ylabel('Steel Waste Sales Rate', fontsize=15)
plt.show()

Looking at the above plot (Fig-5) shows the short terms trend has been removed and smoothened the data.

Limitation :

Works best only when data has white noise or follows a normal distribution
Gives correct results for the analysis of static or historical data. Misleads prediction for dynamically varying data.

3 Detect Seasonality and De-seasoning

It is measured by the seasonality index, which is periodical fluctuation where the same pattern occurs at the regular interval of time within the calendar year.

3.1 Detect Seasonality

To detect seasonality, two popular methods are employed.

3.1.1 Multiple Box Plots

Boxplot represents data spread over a range to show the first, middle, and third quartile and a maximum spread of a given dataset. Using the below lines of codes, seasonality can be detected.

df['year'] = pd.DatetimeIndex(df['Date']).year
plt.figure(figsize=(15,6))
sns.boxplot(x='year', y='Rate in Rs./Kg.', data=df).set_title("Multi Year-wise Box Plot")
plt.show()

Looking at the above Plot (fig-6), in the month of January to March, the average rate increased, which represents the presence of the seasonality effect. However, for more details year-on-year comparison also helps get more details.

3.1.2 Auto Correlation Plot

Autocorrelation is used to check randomness in data. For the data having unknown periodicity, it helps in identifying datatype. For instance, for the monthly data, if there is a regular seasonal effect, we would hope to see massive peak lags after every 12 months. The below plot demonstrates an example of detecting seasonality with the help of an autocorrelation plot.

from pandas.plotting import autocorrelation_plot
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':(10,4), 'figure.dpi':100})
autocorrelation_plot(df['Rate in Rs./Kg.'].tolist())

Detecting Season Index using Auto-Correlation Plot

Sometimes, identifying seasonality is tricky, so using other plots, such as sequence or seasonal subseries plots, helps instead. Here seasonality index varies from 0.75 to -0.25.

3.2 Deseasoning Time Series

Deseasoning means removing seasonality from time-series data. It is a stripping of the pattern of seasonal impacts from the data.

Decomposition is the process of understanding generalizations and problems related to time-series forecasting. We can use python’s stats-model library called seasonal decomposition to remove seasonality from data. This will give us the data only with the trend, cyclic, and irregular variations.

from statsmodels.tsa.seasonal import seasonal_decompose

result_mul = seasonal_decompose(df['Rate in Rs./Kg.'],model='multiplicative', extrapolate_trend='freq', freq=12)
deseason = df['Rate in Rs./Kg.'] - result_mul.seasonal
plt.figure(figsize=(15,6))
plt.plot(deseason)
plt.title('Deseasoning using seasonal_decompose', fontsize=16)
plt.xlabel('Days')
plt.ylabel('Steel Waster Sales Rate')
plt.show()

Deseasoning Time series using Seasonal - Decomposition

4. Detecting Cyclical Variation

The variations in time series which arise out of the phenomenon of business cycles are called the Cyclical Component. The cyclical component is fluctuation around the trend line that happens due to macroeconomic changes such as recession, unemployment, etc. Cyclical fluctuations have repetitive patterns with a time between repetitions of more than a year. It is a recurrent process and less frequent compared to seasonality. We shall be using HP Filters again to detect the cyclical effect from the data.

As already explained in sections 2.1.1 and 2.2.3, again using python’s library ‘hp filter’, we can derive the cyclical variation using the below lines of codes.

sw_cycle,sw_trend = hpfilter(df['Rate in Rs./Kg.'], lamb=100)
df['cycle'] =sw_cycle
df['trend'] =sw_trend
df[['cycle']].plot(figsize=(15,6)).autoscale(axis='x',tight=True)
plt.title('Extracting Cyclic Variations', fontsize=20)
plt.xlabel('Days')
plt.ylabel('Steel Waste Sales Rate', fontsize =15)
plt.show()

Detecting Cyclical Variations using HP Filters

5. Error, Irregular Component, and Residuals

When trend, seasonality, and cyclical behavior are removed, the pattern left behind, which can not be explained, is called an Irregular Component. Various techniques are available to check these terms, such as probability theory, moving average, and Auto-Regressive Methods. Finding cyclic variation itself is considered to be part of the residuals. Using Time Series Decomposing, we can isolate these time series components using the below lines of code.

6. Time Series Decomposition

Time series data can be modeled as an addition or product of trend (Tt), Seasonality (St), cyclical (Ct), or Irregular components (It).

Additive models assume that seasonality and cyclical component are independent of the trend. These are not very common since, in many cases, the seasonality component may not be independent of the trend. The additive model can be used for time series data where linear trends are formed wherein changes are constant over time.

Multiplicative Models are commonly used models for many datasets across industries. For building a forecasting model,
only trend and seasonal components are considered. For cyclical components, a large dataset must have a span of more than 10 years; hence, due to the limitation of availing such a large dataset, cyclical components are rarely used for modeling. The multiplicative models ideally perform well for the nonlinear types of modeling (quadric or exponential).

Time Series Modeling Additive & Multiplicative

We shall be using python’s stats-model libraries to obtain time series decomposition.

from statsmodels.tsa.seasonal import seasonal_decompose
tsm_decompose = seasonal_decompose(np.array(df['Rate in Rs./Kg.']), model = 'multiplicative', freq = 12)
plt.figure(figsize = (15,5))
tsm_plot = tsm_decompose.plot()

We can see the increasing trend from the dataset. Also, seasonality can be detected by having an index ranging between -0.5 to 0.5. Using decompose, our dataset has been added with two new columns, ‘trend’ and ‘seasonality.’

df['seasonal'] = tsm_decompose.seasonal
df['trend'] = tsm_decompose.trend
df[30:35] #Final Dataset Just for ref.

Difference between Seasonality and Cyclicity

Seasonality and cyclicity are both recurring patterns in data, but they differ in their predictability and timescale:

Seasonality:
- Predictable and fixed period.
- Tied to calendar events, often yearly (e.g., ice cream sales peak in summer).
- Easier to forecast due to consistent timing.
Cyclicity:
- Unpredictable and variable period.
- Fluctuations can last for years (e.g., business cycles).
- More difficult to forecast due to uncertain duration and intensity.

Conclusion

We have learned to isolate time series components such as trend, seasonality, and cyclical effects using multiple techniques for better forecasting accuracy. However, the interpretation of the outcomes of these techniques also plays an important role in the context of the domain special problem statement. While working with the Real-Wold Problem statements of a specific company, as a data scientist, It’s also beneficial to get acquainted with the business processes practiced by the said organization with a fair degree of understanding of the time-series data provided by them along with the input of domain expertise.

Differencing methods are to be applied in ascending cronological order (i.e., First Order, Second Order, etc.) to eliminate the trend.
Scipy Technique uses to remove linear trends, and HP Filter is a non-linear technique that works well to remove short-term fluctuation as well and is used for tuning parameters to control the degree of smoothing.
Practically, it has always been tricky to identify seasonality, so using other plots, such as sequence or seasonal subseries plots, helps instead. That means we must go deeper into the data (i.e., year-on-year basis comparison) to detect the seasonality.
HP Filters are widely used for detecting trends, removing trends, and detecting Cyclical Variations.
Additive models are used for linear trends, and multiplicative models can be used for non-linear types of data, such as Quadratic and exponential modeling.
Time series decomposition isolates the time series components (i.e., level, trend, seasonality, and residuals).

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Shailesh

A Data Science Enthusiast and loves to work on Data Science Projects! Willing to solve complex business problems using data science with help of the right application of Statistical Tests & applications using Python & R! Along with this, an understanding of the business Input-Output models and domain expertise also plays an important role.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction

Common Patterns

Validation Techniques

Time Series Forecasting

Exponential Smoothing

ARIMA

Prophet

Deep Learning

Various Techniques to Detect and Isolate Time Series Components Using Python

Introduction

Table of contents

Components of Time Series Forecasting

2. Detecting Trends and Detrending the Data

2.1 Detecting Trends

2.1.1 Detecting Trend Using a Hodrick-Prescott Filter

2.2 Detrending Time Series

2.2.1 Pandas Differencing (First Order)

2.2.2 SciPy Signals

2.2.3 HP Filters (Hodrick-Prescott)

3 Detect Seasonality and De-seasoning

3.1 Detect Seasonality

3.1.1 Multiple Box Plots

3.1.2 Auto Correlation Plot

3.2 Deseasoning Time Series

4. Detecting Cyclical Variation

5. Error, Irregular Component, and Residuals

6. Time Series Decomposition

Difference between Seasonality and Cyclicity

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp