Multivariate Multi-step Time Series Forecasting with Stacked LSTM Seq2Seq Autoencoder in TensorFlow 2.0/Keras

Suggula Last Updated : 31 Jan, 2025

7 min read

In Sequence to Sequence Learning, an RNN model is trained to map an input sequence to an output sequence. The input and output need not necessarily be of the same length. The seq2seq model contains two RNNs, e.g., LSTMs, specifically implemented using LSTM Keras. They can be treated as an encoder and decoder. The encoder part converts the given input sequence to a fixed-length vector, which acts as a summary of the input sequence.

This fixed-length vector is called the context vector. The context vector is given as input to the decoder and the final encoder state as an initial decoder state to predict the output sequence. Sequence to Sequence learning is used in language translation, speech recognition, time series
forecasting, etc.

Application in time Series forcasting

We will use the sequence to sequence learning for time series forecasting. We can use this architecture to easily make a multistep forecast. we will add two layers, a repeat vector layer and time distributed dense layer in the architecture.

A repeat vector layer is used to repeat the context vector we get from the encoder to pass it as an input to the decoder. We will repeat it for n-steps ( n is the no of future steps you want to forecast). The output received from the decoder with respect to each time step is mixed. The time distributed densely will apply a fully connected dense layer on each time step and separates the output for each timestep. The time distributed densely is a wrapper that allows applying a layer to every temporal slice of an input.

We will stack additional layers on the encoder part and the decoder part of the sequence to sequence model. By stacking LSTM’s, it may increase the ability of our model to understand more complex representation of our time-series data in hidden layers, by capturing information at different levels.

This article was published as a part of the Data Science Blogathon.

Application in time Series forcasting
What is the role of LSTM Layer in Keras?
Code of Keras LSTM Layer to Predict Electric Power Consumption

What is the role of LSTM Layer in Keras?

The LSTM (Long Short-Term Memory) layer in Keras plays a vital role in modeling sequential data. The design addresses the challenges of capturing and processing long-term dependencies within sequential input. The layer contains memory cells that can retain information over extended periods, enabling the network to learn patterns and relationships in sequences such as time series or natural language data.

LSTM layers excel in mitigating the vanishing gradient problem associated with traditional RNNs. This problem occurs when gradients diminish during backpropagation, limiting the network’s ability to learn long-term dependencies. LSTMs address this by utilizing a gating mechanism that regulates the flow of information into and out of memory cells. This allows them to selectively retain or discard information, facilitating the modeling of complex sequential patterns.

By incorporating LSTM layers into a neural network, the model gains the capability to capture and understand dependencies across multiple time steps or positions in the input sequence. This makes LSTMs particularly useful in various applications, including machine translation, sentiment analysis, speech recognition, and time series forecasting, where understanding and modeling the temporal relationships is crucial for accurate predictions.

Code of Keras LSTM Layer to Predict Electric Power Consumption

The data used is Individual household electric power consumption. You can download the dataset from this link.

Importing Libraries

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import tensorflow as tf
import os

Now load the dataset into a pandas data frame.

df=pd.read_csv(r'household_power_consumption.txt', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0,1]}, index_col=['datetime'])
df.head()

Imputing Null Values

df = df.replace('?', np.nan)
df.isnull().sum()

Now we will create a function that will impute missing values by replacing them with values on their previous day.

def fill_missing(values):
    one_day = 60*24
    for row in range(df.shape[0]):
        for col in range(df.shape[1]):
            if np.isnan(values[row][col]):
                values[row,col] = values[row-one_day,col]
df = df.astype('float32')
fill_missing(df.values)
df.isnull().sum()

Downsampling of Data from minutes to Days

There are more than 2 lakh observations recorded. Let’s make the data simpler by downsampling them from the frequency of minutes to days.

daily_df = df.resample('D').sum()
daily_df.head()

Train – Test Split

After downsampling, the number of instances is 1442. We will split the dataset into train and test data in a 75% and 25% ratio of the instances. (0.75 * 1442 = 1081)

train_df,test_df = daily_df[1:1081], daily_df[1081:]

Scaling the values

All the columns in the data frame are on a different scale. Now we will scale the values to -1 to 1 for faster training of the models.

train = train_df
scalers={}
for i in train_df.columns:
    scaler = MinMaxScaler(feature_range=(-1,1))
    s_s = scaler.fit_transform(train[i].values.reshape(-1,1))
    s_s=np.reshape(s_s,len(s_s))
    scalers['scaler_'+ i] = scaler
    train[i]=s_s
test = test_df
for i in train_df.columns:
    scaler = scalers['scaler_'+i]
    s_s = scaler.transform(test[i].values.reshape(-1,1))
    s_s=np.reshape(s_s,len(s_s))
    scalers['scaler_'+i] = scaler
    test[i]=s_s

Converting the series to samples

Now we will make a function that will use a sliding window approach to transform our series into samples of input past observations and output future observations to use supervised learning algorithms.

def split_series(series, n_past, n_future):
  #
  # n_past ==> no of past observations
  #
  # n_future ==> no of future observations 
  #
  X, y = list(), list()
  for window_start in range(len(series)):
    past_end = window_start + n_past
    future_end = past_end + n_future
    if future_end > len(series):
      break
    # slicing the past and future parts of the window
    past, future = series[window_start:past_end, :], series[past_end:future_end, :]
    X.append(past)
    y.append(future)
  return np.array(X), np.array(y)

For this case, let’s assume that given the past 10 days observation, we need to forecast the next 5 days observations.

n_past = 10
n_future = 5 
n_features = 7

Now convert both the train and test data into samples using the split_series function.

X_train, y_train = split_series(train.values,n_past, n_future)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1],n_features))
y_train = y_train.reshape((y_train.shape[0], y_train.shape[1], n_features))
X_test, y_test = split_series(test.values,n_past, n_future)
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1],n_features))
y_test = y_test.reshape((y_test.shape[0], y_test.shape[1], n_features))

Model Architecture

Now we will create two models in the below-mentioned architecture.

E1D1 ==> Sequence to Sequence Model with one encoder layer and one decoder layer.

# E1D1
# n_features ==> no of features at each timestep in the data.
#
encoder_inputs = tf.keras.layers.Input(shape=(n_past, n_features))
encoder_l1 = tf.keras.layers.LSTM(100, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)

encoder_states1 = encoder_outputs1[1:]

#
decoder_inputs = tf.keras.layers.RepeatVector(n_future)(encoder_outputs1[0])

#
decoder_l1 = tf.keras.layers.LSTM(100, return_sequences=True)(decoder_inputs,initial_state = encoder_states1)
decoder_outputs1 = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l1)

#
model_e1d1 = tf.keras.models.Model(encoder_inputs,decoder_outputs1)

#
model_e1d1.summary()

E2D2 ==> Sequence to Sequence Model with two encoder layers and two decoder layers.

# E2D2
# n_features ==> no of features at each timestep in the data.
#
encoder_inputs = tf.keras.layers.Input(shape=(n_past, n_features))
encoder_l1 = tf.keras.layers.LSTM(100,return_sequences = True, return_state=True)
encoder_outputs1 = encoder_l1(encoder_inputs)
encoder_states1 = encoder_outputs1[1:]
encoder_l2 = tf.keras.layers.LSTM(100, return_state=True)
encoder_outputs2 = encoder_l2(encoder_outputs1[0])
encoder_states2 = encoder_outputs2[1:]
#
decoder_inputs = tf.keras.layers.RepeatVector(n_future)(encoder_outputs2[0])
#
decoder_l1 = tf.keras.layers.LSTM(100, return_sequences=True)(decoder_inputs,initial_state = encoder_states1)
decoder_l2 = tf.keras.layers.LSTM(100, return_sequences=True)(decoder_l1,initial_state = encoder_states2)
decoder_outputs2 = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l2)
#
model_e2d2 = tf.keras.models.Model(encoder_inputs,decoder_outputs2)
#
model_e2d2.summary()

Training the models

I have used Adam optimizer and Huber loss as the loss function. Let’s compile and run the model.

reduce_lr = tf.keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x)
model_e1d1.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e1d1=model_e1d1.fit(X_train,y_train,epochs=25,validation_data=(X_test,y_test),batch_size=32,verbose=0,callbacks=[reduce_lr])
model_e2d2.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e2d2=model_e2d2.fit(X_train,y_train,epochs=25,validation_data=(X_test,y_test),batch_size=32,verbose=0,callbacks=[reduce_lr])

Prediction on test samples

pred_e1d1=model_e1d1.predict(X_test)
pred_e2d2=model_e2d2.predict(X_test)

Inverse Scaling of the predicted values

Now we will convert the predictions to their original scale.

for index,i in enumerate(train_df.columns):
    scaler = scalers['scaler_'+i]
    pred1_e1d1[:,:,index]=scaler.inverse_transform(pred1_e1d1[:,:,index])
    pred_e1d1[:,:,index]=scaler.inverse_transform(pred_e1d1[:,:,index])
    pred1_e2d2[:,:,index]=scaler.inverse_transform(pred1_e2d2[:,:,index])
    pred_e2d2[:,:,index]=scaler.inverse_transform(pred_e2d2[:,:,index])
    y_train[:,:,index]=scaler.inverse_transform(y_train[:,:,index])
    y_test[:,:,index]=scaler.inverse_transform(y_test[:,:,index])

Checking Error

Now we will calculate the mean absolute error of all observations.

from sklearn.metrics import mean_absolute_error
for index,i in enumerate(train_df.columns):
  print(i)
  for j in range(1,6):
    print("Day ",j,":")
    print("MAE-E1D1 : ",mean_absolute_error(y_test[:,j-1,index],pred1_e1d1[:,j-1,index]),end=", ")
    print("MAE-E2D2 : ",mean_absolute_error(y_test[:,j-1,index],pred1_e2d2[:,j-1,index]))
  print()
  print()

From the above output, we can observe that, in some cases, the E2D2 model has performed better than the E1D1 model with less error. Training different models with a different number of stacked layers and creating an ensemble model also performs well.

Note: The results vary with respect to the dataset. If we stack more layers, it may also lead to overfitting. So the number of layers to be stacked acts as a hyperparameter.

Conclusion

Congratulations, you have learned how to implement multivariate multi-step time series forecasting using TF 2.0 / Keras. This is my first attempt at writing a blog. So please share your opinion in the comments section below.

Thanks for reading.

References:

Frequently Asked Questions

Q1. What does LSTM do in Keras?

A. In Keras, LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) layer. LSTM networks capture and process sequential information, such as time series or natural language data, by mitigating the vanishing gradient problem found in traditional RNNs. LSTM layers provide memory cells that retain information over long periods, making them effective for modeling temporal dependencies in sequential data.

Q2. How do I use LSTM layers in Keras?

A. To use LSTM layers in Keras, you can follow these steps:
1. Import the necessary modules from Keras.
2. Create a sequential model or functional model.
3. Add an LSTM layer using LSTM() and specify the desired number of units and other parameters.
4. Optionally, add additional LSTM layers or other types of layers.
5. Compile and train the model using appropriate data and settings.
6. Evaluate or make predictions using the trained model.

Q3. Why use LSTM instead of CNN?

A. In certain scenarios, researchers prefer LSTMs (Long Short-Term Memory) over CNNs (Convolutional Neural Networks) because LSTMs excel at capturing sequential dependencies in data, such as time series or natural language data. In contrast, CNNs are better suited for extracting spatial features from fixed-size inputs like images. LSTMs’ ability to retain long-term information and model temporal dependencies makes them suitable for tasks involving sequential data analysis.

Suggula

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

DS_learner

Thanks for the code. Excellent job, However from what I understand is the prediction is done on 6 separate features and the prediction is how would the feature behave in the next time stamp. Instead is it possible to predict something like the weather in the next hour based on features in the previous hour ? What changes should I make to the above script ?

Ivan

Hello, I'm getting error: " A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead" when I run those part of code : train = train_df scalers={} for i in train_df.columns: scaler = MinMaxScaler(feature_range=(-1,1)) s_s = scaler.fit_transform(train[i].values.reshape(-1,1)) s_s=np.reshape(s_s,len(s_s)) scalers['scaler_'+ i] = scaler train[i]=s_s test = test_df for i in train_df.columns: scaler = scalers['scaler_'+i] s_s = scaler.transform(test[i].values.reshape(-1,1)) s_s=np.reshape(s_s,len(s_s)) scalers['scaler_'+i] = scaler test[i]=s_s

Michal

Thanks for great article. Just curious / newbie question: - by multi-variate you mean you are forecasting more features or you calculate forecast of one feature based on others? Thanks!

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Multivariate Multi-step Time Series Forecasting with Stacked LSTM Seq2Seq Autoencoder in TensorFlow 2.0/Keras

Application in time Series forcasting

Table of contents

What is the role of LSTM Layer in Keras?

Code of Keras LSTM Layer to Predict Electric Power Consumption

Importing Libraries

Imputing Null Values

Downsampling of Data from minutes to Days

Train – Test Split

Scaling the values

Converting the series to samples

Model Architecture

Training the models

Prediction on test samples

Inverse Scaling of the predicted values

Checking Error

Conclusion

References:

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie