This article was published as a part of the Data Science Blogathon
Let us have a quick overview of this blog.
→What is time series?
→Real-life scenarios of time series
→Time series analysis
→Forecasting
→Types of forecasting
1) Quantitative forecasting
2) Qualitative forecasting
→Regression vs Time series
→Time Series components
→Analyzing kaggle time-series data
→Plotting the time-series graph
Time series is a sequence or series of data points in which the time component is involved throughout the occurrence.
Healthcare industry – Blood pressure monitoring, Heart rate monitoring.
Environment – Global temperature and air pollution levels.
Society – Birth rates over a period of time, Population, etc
https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.statisticshowto.com%2Ftimeplot%2F&psig=AOvVaw3ITtX1BRvscqV6ZaErhFeK&ust=1624890003979000&source=images&cd=vfe&ved=0CAoQjRxqFwoTCKDAlpOBuPECFQAAAAAdAAAAABAU
Analyzing this time series data with certain tools and techniques is called time series analysis.
The restaurant’s daily visitors are predicted by this time series data. So that the restaurant management can appoint and accommodate staff according to the number of visitors.
Forecasting is the process of making predictions from the historical data so that they can predict the future from the past and present data.
1) Quantitative forecasting
2) Qualitative forecasting
Let us see what it is,
Quantitative forecasting is done based on the historical data (i,e) Past and present data mostly numerical data. Through this historical data, we use statistical methods and so we can predict with lesser bias.
Qualitative forecasting is done based on the opinion and judgment of the subject matter experts and the customers. Why we rely upon judgment instead of data? Because in some cases, the past data are not available or unclear. so here we are depend on judgment and opinions.
You may have some doubts about regression and time series. Both have some similarities and differences.
The Regression analysis and Time series analysis are done on continuous variables.
→It is the relationship between dependent and independent variables.
→The target variable is continuous.
→This involves finding patterns in the data and predict the target with this pattern.
→It is the series of data points associated with time.
→The target variable is continuous.
→This involves finding trends in the data and forecast the future with this trend.
Time series – https://i1.wp.com/statisticsbyjim.com/wp-content/uploads/2020/07/TimeSeriesTrade.png?fit=576%2C384&ssl=1
The time-series graph helps to highlight the trend and behavior of the data over time for building a more reliable model. To understand these patterns, we should structure this data and breakdown into several factors. We use various components to break down this data. They are,
Structural breaks
Trend
Seasonality
Cyclicity
Noise
Level
It is a component that shows some sudden change in the time series data. This structural break affects the reliability of the results. Statistical methods should be used to identify the structural breaks.
Time series data may have a thing that is proportionate to the time period. There occurs the trend. In short “Trend” is the demonstration of whether the time series has moved higher or lower over a time period. The reliability of the results of time series relies upon the correct identification of time trends.
Here is an example, the Monthly revenue of a company. This shows an increasing trend
Seasonality is also a component where the time series data shows a regular pattern over an interval of time. It repeats after the fixed interval of time.
(An example of a time series with seasonality is sales, which often increases for every 20 days)
Cyclicity is the component in which the time series data repeats after some interval of time. The interval is not fixed here.
Example:
Electricity demand per week is plotted in a time-series graph. The demand per 2 weeks repeats cyclically. This represents cyclicity.
https://robjhyndman.com/hyndsight/2011-12-14-cyclicts_files/figure-html/unnamed-chunk-3-1.png
Noise is the random fluctuation in the time series data. We can’t consider them for predicting the future.
The average time series is called level.
In this analysis, I have used Kaggle‘s dataset. Kaggle is a platform where we can find datasets, notebooks, and other kinds of stuff related to data science. Competitions are also hosted for practice.
Dataset used in this analysis: Time series starter dataset
import pandas as pd data = pd.read_csv('/content/sample_data/Month_Value_1.csv') data.head()
This dataset contains 5 columns and 96 rows.
The columns are
[0] – Period
[1] – Revenue
[2] – Sales_quantity
[3] – Average_cost
[4] – The_average_annual_payroll_of_the_region
Description of each column to decide which is important
Period – It contains the Period for the model. The monthly wise date from 2015 to 2020 is specified here.
Revenue – Company’s revenue for each month from 2015 to 2020.
Sales_quantity – Company’s sales quantity
Average_cost – Average cost of production
The_average_annual_payroll_of_the_region – The average number of employees in the region per year.
Plotting the line chart for 5 columns
data.plot.line(x=none,y=none)
This contains all the data from 5 columns. So it doesn’t give an exact view. So
Let us clean the dataset.
We can analyze the time series of revenue from 2015 to 2020 and drop all other columns now.
data = data.drop('Sales_quantity', 1) data = data.drop('Average_cost', 1) data = data.drop('The_average_annual_payroll_of_the_region', 1)
The syntax for dropping the column is
dataframe.drop('Column_name',1)
where 1 is the axis number (0 for rows and 1 for columns)
Now we have only period and revenue columns for analysis.
Let us plot the graph
data.plot.line(x=None,y=None)
This time-series graph shows the increasing trend. So the revenue of the company increases from 2015 to 2020.
You can take a look into this Time series notebook for code :
Time series starter dataset notebook
We have seen some concepts of time series analysis and analyzed Kaggle’s starter dataset for time series.
Thanks for reading!
I hope you enjoyed the article and increased your knowledge about time series analysis. Please feel free to contact me at [email protected] Linkedin
Want to share your thoughts? Feel free to comment below
About the author
Currently, I am pursuing my Bachelor of Engineering (B.E) in Computer Science from the Government College of Engineering, Srirangam, Tamil Nadu. I am very enthusiastic about Statistics, Machine Learning, and Data Science.
Connect with me on Linkedin Mohamed Illiyas
The media shown in this article explaining how to Deploy Streamlit Application on Heroku are not owned by Analytics Vidhya and are used at the Author’s discretion.
Its a good website. It is helpful information Blood Pressure Monitors. You can also see at diabetesworld.co.in