Time series analysis of data is not just a collection of numbers, in this case Netflix stocks. It is a captivating tapestry that weaves together the intricate story of our world with Pandas. Like a mystical thread, it captures the ebb and flow of events, the rise and fall of trends, and the emergence of patterns. It reveals the hidden connections and correlations that shape our reality, painting a vivid picture of the past and offering glimpses into the future.
Time series analysis is more than just a tool. It is a gateway to a realm of knowledge and foresight. You will be empowered to unlock the secrets hidden within the temporal fabric of data, transforming raw information into valuable insights. Also, guides you in making informed decisions, mitigating risks, and capitalizing on emerging opportunities
Let’s embark on this exciting adventure together and discover how time truly holds the key to understanding our world. Are you ready? Let’s dive into the captivating realm of time series analysis!
This article was published as a part of the Data Science Blogathon.
A time series is a sequence of data points collected or recorded over successive and equally spaced intervals of time.
There are 4 Components of Time Series. They are:
Here is a visual interpretation of the various components of the Time Series.
Let’s now see a practical use of yfinance. First, we will download the yfinance library using the following command.
!pip install yfinance
Please be aware that if you encounter any errors while running this code on your local machine, such as in Jupyter Notebook, you have two options: either update your Python environment or consider utilizing cloud-based notebooks like Google Colab. as an alternative solution.
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime
In this demo, we will be using the Netflix’s Stock data(NFLX)
df = yf.download(tickers = "NFLX")
df
Let’s examine the columns in detail for further analysis:
# print the metadata of the dataset
df.info()
# data description
df.describe()
df['Open'].plot(figsize=(12,6),c='g')
plt.title("Netlix's Stock Prices")
plt.show()
There has been a steady increase in Netflix’s Stock Prices from 2002 to 2021.We shall use Pandas to investigate it further in the coming sections.
Due to its roots in financial modeling, Pandas provides a rich array of tools for handling dates, times, and time-indexed data. Now, let’s explore the key Pandas data structures designed specifically for effective manipulation of time series data.
Time shifting, also known as lagging or shifting in time series analysis, refers to the process of moving the values of a time series forward or backward in time. It involves shifting the entire series by a specific number of periods.
Presented below is the unaltered dataset prior to any temporal adjustments or shifts:
There are two common types of time shifting:
1.1 Forward Shifting(Positive Lag)
To shift our data forwards, the number of periods (or increments) must be positive.
df.shift(1)
Note: The first row in the shifted data contains a NaN value since there is no previous value to shift it from.
1.2 Backward Shifting(Negative Lag)
To shift our data backwards, the number of periods (or increments) must be negative.
df.shift(-1)
Note: The last row in the shifted data contains a NaN value since there is no subsequent value to shift it from.
Rolling is a powerful transformation method used to smooth out data and reduce noise. It operates by dividing the data into windows and applying an aggregation function, such as
mean(), median(), sum(), etc. to the values within each window.
df['Open:10 days rolling'] = df['Open'].rolling(10).mean()
df[['Open','Open:10 days rolling']].head(20)
df[['Open','Open:10 days rolling']].plot(figsize=(15,5))
plt.show()
Note: The first nine values have all become blank as there wasn’t enough data to actually fill them when using a window of ten days.
df['Open:20'] = df['Open'].rolling(window=20,min_periods=1).mean()
df['Open:50'] = df['Open'].rolling(window=50,min_periods=1).mean()
df['Open:100'] = df['Open'].rolling(window=100,min_periods=1).mean()
#visualization
df[['Open','Open:10','Open:20','Open:50','Open:100']].plot(xlim=['2015-01-01','2024-01-01'])
plt.show()
They are commonly used to smoothen plots in time series analysis. The inherent noise and short-term fluctuations in the data can be reduced, allowing for a clearer visualization of underlying trends and patterns.
Time resampling involves aggregating data into predetermined time intervals, such as monthly, quarterly, or yearly, to provide a summarized view of the underlying trends. Instead of examining data on a daily basis, resampling condenses the information into larger time units, allowing analysts to focus on broader patterns and trends rather than getting caught up in daily fluctuations.
#year end frequency
df.resample(rule='A').max()
This resamples the original DataFrame df based on the year-end frequency, and then calculates the maximum value for each year. This can be useful in analyzing the yearly highest stock price or identifying peak values in other time series data.
df['Adj Close'].resample(rule='3Y').mean().plot(kind='bar',figsize=(10,4))
plt.title('3 Year End Mean Adj Close Price for Netflix')
plt.show()
This bar plot show the average Adj_Close value of Netflix Stock Price for every 3 years from 2002 to 2023.
Below is a complete list of the offset values. The list can also be found in the pandas documentation.
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter end frequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A, Y | year end frequency |
BA, BY | business year end frequency |
AS, YS | year start frequency |
BAS, BYS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
Python’s pandas library is an incredibly robust and versatile toolset that offers a plethora of built-in functions for effectively analyzing time series data. In this article, we explored the immense capabilities of pandas for handling and visualizing time series data.
Throughout the article, we delved into essential tasks such as time sampling, time shifting, and rolling analysis using Netflix stock data. These fundamental operations serve as crucial initial steps in any time series analysis workflow. By mastering these techniques, analysts can gain valuable insights and extract meaningful information from their data. Another way we could use this data would be to predict Netflix’s stock prices for the next few days by employing machine learning techniques. This would be particularly valuable for shareholders seeking insights and analysis.
The Code and Implementation is Uploaded at Github at Netflix Time Series Analysis.
Hope you found this article useful. Connect with me on LinkedIn.
Time series analysis is a statistical technique used to analyze patterns, trends, and seasonality in data collected over time. It is widely used to make predictions and forecasts, understand underlying patterns, and make data-driven decisions in fields such as finance, economics, and meteorology.
The main components of a time series are trend, seasonality, cyclical variations, and random variations. Trend represents the long-term direction of the data, seasonality refers to regular patterns that repeat at fixed intervals, cyclical variations correspond to longer-term economic cycles, and random variations are unpredictable fluctuations.
Time series analysis poses challenges such as handling irregular or missing data, dealing with outliers and noise, identifying and removing seasonality, selecting appropriate forecasting models, and evaluating forecast accuracy. The presence of trends and complex patterns also adds complexity to the analysis.
Time series analysis finds applications in finance for predicting stock prices, economics for analyzing economic indicators, meteorology for weather forecasting, and various industries for sales forecasting, demand planning, and anomaly detection. These applications leverage time series analysis to make data-driven predictions and decisions.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.