TimesFM for Time-Series Forecasting

Mounish V Last Updated : 28 Sep, 2024
7 min read

Introduction

The Time Series Foundation Model, or TimesFM in short, is a pretrained time-series foundation model developed by Google Research for forecasting univariate time-series. As a pretrained foundation model, it simplifies the often complex process of time-series analysis. Google Research says that their time-series foundation model exhibits zero-shot forecasting capabilities that rival the accuracy of leading supervised forecasting models across multiple public datasets.

TimesFM 1.0

Overview

  • TimesFM is a pretrained model developed by Google Research for univariate time-series forecasting, providing zero-shot prediction capabilities that rival leading supervised models.
  • TimesFM is a transformer-based model with 200 million parameters, designed to predict future values of a single variable based on its historical data, supporting context lengths up to 512 points.
  • It exhibits strong forecasting accuracy on unseen datasets, leveraging its transformer layers and tunable hyperparameters such as model dimensions, patch lengths, and horizon lengths.
  • The demo uses TimesFM on Kaggle’s electric production dataset. It shows accurate forecasting with minimal errors (e.g., MAE = 3.34), performing well in comparison to actual data.
  • TimesFM is an advanced model that simplifies time-series analysis while achieving near state-of-the-art accuracy in predicting future trends across various datasets without needing additional training.

Background

A time series consists of data points collected at consistent time intervals, such as daily stock prices or hourly temperature readings. Forecasting such data is often complex due to elements like trends, seasonal variations, and erratic patterns. These challenges can hinder accurate predictions of future values, but models like TimesFM are designed to streamline this task.

Understanding TimesFM Architecture

The TimesFM 1.0 contains a 200M parameter, a transformer-based model trained decoder-only on a pretrain dataset with over 100 billion real-world time points. 

The TimesFM 1.0 generates accurate forecasts on unseen datasets without additional training; it predicts the future values of a single variable based on its own historical data. It involves using one variable (time series) to forecast future points of that same variable with respect to time. It performs univariate time series forecasting for context lengths up to 512-time points, and on any horizon lengths, it has an optional frequency indicator input.

TimesFM Architecture

Also read: Time series Forecasting: Complete Tutorial | Part-1

Parameters (Hyperparameters)

These are tunable values that control the behavior of the model and impact its performance:

  1. model_dim: Dimensionality of the input and output vectors.
  2. input_patch_len (p): Length of each input patch.
  3. output_patch_len (h): Length of the forecast generated in each step.
  4. num_heads: Number of attention heads in the multi-head attention mechanism.
  5. num_layers (nl): Number of stacked transformer layers.
  6. context length (L): The length of the historical data used for prediction.
  7. horizon length (H): The length of the forecast horizon.
  8. Number of input tokens (N), calculated as the total context length divided by the input patch length: N = L/p. Each of these tokens is fed into the transformer layers for processing.

Components

These are the fundamental building blocks of the model’s architecture:

  1. Residual Blocks: Neural network blocks used to process input and output patches.
  2. Stacked Transformer: The core transformer layers in the model.
  3. tj: The input tokens fed to the transformer layers, derived from the processed patches.

t_j = InputResidualBlock(ŷ_j ⊙ (1 – m_j)) + PE_j

where ỹ_j is the j-th patch of the input series, m̃_j is the corresponding mask, and PE_j is the positional encoding.

  1. oj: The output token at step j, generated by the transformer layers based on the input tokens. It is used to predict the corresponding output patch:

o_j = StackedTransformer((t_1, ṁ_1), …, (t_j, ṁ_j))

  1. m1:L (mask): The mask used to ignore certain parts of the input during processing.

The loss function is used during training. In the case of point forecasting, it is the Mean Squared Error (MSE):

TrainLoss = (1 / N) * Σ (MSE(ŷp(j+1):p(j+h), yp(j+1):p(j+h)))

Where ŷ are the model’s predictions and y are the true future values.

Also read: Introduction to Time Series Data Forecasting

TimesFM 1.0 for Forecasting

The “Electric Production” dataset is available on Kaggle and contains data related to electric production over time. It consists of only two columns: DATE, which represents the date of the recorded values, and Value, which indicates the amount of electricity produced in that month. Our task is to forecast 24 months of data using TimesFM.

Demo

Before we start, make sure that you’re using a GPU. I’m doing this demonstration on kaggle and I’ll be using the GPU T4 x 2 accelerator.

Let’s install “timesfm” using pip, the “-q” will just install it without displaying anything.

!pip -q install timesfm

Let’s import a few necessary libraries and read the dataset.

import timesfm
import pandas as pd
data=pd.read_csv('/kaggle/input/electric-production/Electric_Production.csv')
data.head()
Dataset load Output

It performs univariate time series forecasting for context lengths up to 512 timepoints and on any horizon lengths, it has an optional frequency indicator input.

data['DATE']=pd.to_datetime(data['DATE'])
data.head()

Converted the DATE column to datetime, and now it’s in YYYY-MM-DD format

Converted the DATE column to datetime
#Let's Visualise the Datas
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore') # Settings the warnings to be ignored
sns.set(style="darkgrid")
plt.figure(figsize=(15, 6))
sns.lineplot(x="DATE", y='Value', data=data, color='green')
plt.title('Electric Production')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

Let’s look at the data:

Output
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Set index to DATE and decompose the data
data.set_index("DATE", inplace=True)
result = seasonal_decompose(data['Value'])
# Create a 2x2 grid for the subplots
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 10))
result.observed.plot(ax=ax1, color='darkgreen')
ax1.set_ylabel('Observed')
result.trend.plot(ax=ax2, color='darkgreen')
ax2.set_ylabel('Trend')
result.seasonal.plot(ax=ax3, color='darkgreen')
ax3.set_ylabel('Seasonal')
result.resid.plot(ax=ax4, color='darkgreen')
ax4.set_ylabel('Residual')
plt.tight_layout()
plt.show()
# Adjust layout and show the plots
plt.tight_layout()
plt.show()
# Reset the index after plotting
data.reset_index(inplace=True)

We can see the components of the time series, like trend and seasonality, and we can get an idea of their relation to time.

Output
df = pd.DataFrame({'unique_id':[1]*len(data),'ds': data["DATE"], 
"y":data['Value']})
# Spliting into 94% and 6%
split_idx = int(len(df) * 0.94)
# Split the dataframe into train and test sets
train_df = df[:split_idx]
test_df = df[split_idx:]
print(train_df.shape, test_df.shape)
(373, 3) (24, 3)

Let’s forecast 24 months or 2 years of the data using the remaining data as past data.

# Initialize the TimesFM model with specified parameters
tfm = timesfm.TimesFm(
   context_len=128,       # Length of the context window for the model
   horizon_len=24,        # Forecasting horizon length
   input_patch_len=32,    # Length of input patches
   output_patch_len=128,  # Length of output patches
   num_layers=20,        
   model_dims=1280,      
)
# Load the pretrained model checkpoint
tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")
# Forecasting the values using the TimesFM model
timesfm_forecast = tfm.forecast_on_df(
   inputs=train_df,       # Input training data for training
   freq="MS",             # Frequency of the time-series data
   value_name="y",        # Name of the column containing the values to be forecasted
   num_jobs=-1,           # Set to -1 to use all available cores
)
timesfm_forecast = timesfm_forecast[["ds","timesfm"]]

The predictions are ready let’s look at both the actual values and predicted values

timesfm_forecast.head()
dsTimesfm
02016-02-01111.673813
12016-03-01100.474892
22016-04-0189.024544
32016-05-0190.391014
42016-06-01100.934502
test_df.head()
unique_iddsy
37312016-02-01106.6688
37412016-03-0195.3548
37512016-04-0189.3254
37612016-05-0190.7369
37712016-06-01104.0375
import numpy as np
actuals = test_df['y']
predicted_values = timesfm_forecast['timesfm']
# Convert to numpy arrays
actual_values = np.array(actuals)
predicted_values = np.array(predicted_values)
# Calculate error metrics
MAE = np.mean(np.abs(actual_values - predicted_values))  # Mean Absolute Error
MSE = np.mean((actual_values - predicted_values)**2)     # Mean Squared Error
RMSE = np.sqrt(np.mean((actual_values - predicted_values)**2))  # Root Mean Squared Error
# Print the error metrics
print(f"Mean Absolute Error (MAE): {MAE}")
print(f"Mean Squared Error (MSE): {MSE}")
print(f"Root Mean Squared Error (RMSE): {RMSE}")
Mean Absolute Error (MAE): 3.3446476043701163

Mean Squared Error (MSE): 22.60650784076036

Root Mean Squared Error (RMSE): 4.754630147630872
# Let's Visualise the Data
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')  # Setting the warnings to be ignored
# Set the style for seaborn
sns.set(style="darkgrid")
# Plot size
plt.figure(figsize=(15, 6))
# Plot actual timeseries data
sns.lineplot(x="ds", y='timesfm', data=timesfm_forecast, color='red', label='Forecast')
# Plot forecasted values
sns.lineplot(x="DATE", y='Value', data=data, color='green', label='Actual Time Series')
# Set plot title and labels
plt.title('Electric Production: Actual vs Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
# Show the legend
plt.legend()
# Display the plot
plt.show()
Output

The predictions are close to the actual values. The model also performs well on the error metrics [MSE, RMSE, MAE] despite forecasting the values in zero-shot.

Also read: A Comprehensive Guide to Time Series Analysis and Forecasting

Conclusion

In conclusion, TimesFM, a transformer-based pretrained model by Google Research, demonstrates impressive zero-shot forecasting capabilities for univariate time-series data. Its architecture and training on extensive datasets enable accurate predictions, showing the potential to streamline time-series analysis while approaching the accuracy of state-of-the-art models in various applications.

Are you looking for more articles on similar topics like this? Check out our Time Series articles.

Frequently Asked Questions

Q1. How would you explain MAE (Mean Absolute Error)?

Ans.  The Mean Absolute Error (MAE) calculates the average of the absolute differences between predictions and actual values, providing an easy way to evaluate model performance. A smaller MAE implies more accurate forecasts and a more reliable model.

Q2. What does seasonality mean in time series analysis?

Ans. Seasonality shows the regular, predictable variations in a time series that arise from seasonal influences. For example, annual retail sales often surge during the holiday period. It’s important to consider these factors.

Q3. What is a trend in time series analysis?

Ans. A trend in time series data denotes a sustained direction or movement observed over time, which can be upward, downward, or stable. Identifying trends is crucial for comprehending the data’s long-term behavior, as it impacts forecasting and the effectiveness of the predictive model.

Q4. How does TimesFM forecast univariate time-series data?

Ans. The Timeseries Foundation model predicts a single variable by examining its historical trends. Utilizing a decoder-only transformer-based architecture, it provides precise forecasts based on previous values of that variable.

I'm a tech enthusiast, graduated from Vellore Institute of Technology. I'm working as a Data Science Trainee right now. I am very much interested in Deep Learning and Generative AI.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details