This article was published as a part of the Data Science Blogathon.
Even though there are myriad complex methods and systems aimed at trying to forecast future stock prices, the simple method of linear regression does help to understand the past trend and is used by professionals as well as beginners to try and extrapolate the existing or past trend into the future. If the existing trend carries on into the future then you could have a potential winner.
A caveat needs to be added: Nothing can determine with any assurance that the future will turn out to be exactly like the past and so this method like other forecasting methods despite being fundamentally useful has its limitations.
With this in mind, let’s try and figure out the future stock prices of Infosys (NSE Symbol: INFY). Let’s recap the concept of linear regression, choose an arbitrary time frame, take the past data, apply the method, identify the past trend, and check the results.
The simplest form of the regression equation with one dependent and one independent variable is defined by the formula
y = m*x + c
where y is the estimated dependent variable, m is the regression coefficient, or what is commonly called the slope, x is the independent variable and c is a constant. In simple words, y is the output when m, x, and c are used as inputs.
Linear regression does try to predict trends and future values. It essentially, though not with pin-point accuracy can answer questions like,
This is as far as future stock prices or the financial markets go. It of course is much more useful and extensively used outside of the financial markets too. However, for this article, we will examine its usefulness in forecasting the stock trend.
Taking the past prices of INFY from 09-Apr-2020 till 10-Jul-2020, a total of 60 trading days or roughly 3 calendar months and by following the below procedure, we will get the past 60-day trend of Infosys.
Take the past 60 days close price data in excel. I’m showing only 3 days of data for illustration purposes. (Data source– https://in.finance.yahoo.com/quote/INFY.NS/history?p=INFY.NS)
Symbol | Date | Close |
INFY |
16-Apr-20 |
623.85 |
INFY |
17-Apr-20 |
628.75 |
INFY |
20-Apr-20 |
653.3 |
Select the date and close columns for the 60 values, insert a scatter plot like below.
Select the quick layout as fx as shown below.
To get the linear trend as shown below.
Based on the past values, excel has calculated the slope, m= 1.3312 which means on average the stock of Infosys has increased by 1.33 Rs. during this period. The c value which is the constant is a minus 57849.
Let’s look at what the forecasted values look like if we assume this trend continues into the future. Take the next 60 days as the test data in order to compare the forecasts with the actual close date. This will start from 13-Jul-2020 and extend till 05-Oct-2020 (till recently).
Forecasted value, y = 1.3312*x – 57489
Apply the above formula to all the rows of the excel. Remember x is the date here and so you have to convert the result into a number to get the correct result like below.
Next, compare the actual close with the forecasted value and get the percentage difference between these values.
I would not get into the statistical interpretation of the forecasts and for now, only consider how much of the forecast is versus the actual value. i.e. actual close minus forecast divided by the forecast.
Once you do this assignment, you will find that the percentage difference ranged from a minimum of 3% (good forecast) on 14-Jul-2020 to a maximum of 19% on 31-Jul-2020 (not a good one) and tells you the nature of linear trend which is a proportionate change over a period of time.
We knew from the past data the rate of the linear increase, we knew that we do not know whether the future is going to be better than the past or worse than the past or equal to the past.
We simply assumed that it will be equal to the past which is the standard assumption one makes when using the simple linear regression model. However, the actual future data in our example proved that the future was indeed better than the past and if it is so then who in this world would complain.
Disclaimer: This is not Investment Advice or a suggested approach to forecast future prices and this article is purely meant for educational purposes. The goal is not to predict future values but examine the usefulness of the linear regression method to extrapolate the past. Any investment or trade made using the technique discussed above should be at the concerned person’s risk.
I have an article where I have successfully used Piecewise Linear regression and optimization techniques for portfolio optimization .
I understand that you want to introduce linear regression, however: We know that 0) Stock prices cannot be linear in time since prices are positive 1) The deviations from linear model are NOT normally distributed -which is a check every Data Scientist should be doing, but a lot of Data Smiths leave out- it is not even mathematically valid to do linear regression on a linear model of growth in stock prices 2) The long term trend in stock prices is exponential (either growth or decay, look at RocketFuel stock RFI while it was public) And 3) In fact the deviations from exponential growth are log-normally distributed (Okay, that was lazy. More correctly, the deviations in log(price) space are better modeled as normally distributed.) So a solution, which is almost as easy, is to take the log(price), model it that as linear growth in time (corresponding to exponential price growth) and do linear regression of the exponential model.
Hi Ranjeet, thanks for your comments.it would be a great idea to look for more appropriate models that will fit the data. Log normal distribution is one such. Here I tried a very simple exercise to check if someone can find it useful and of course like I have highlighted the limitations and and as well as the usefulness of the linear model. Yes we assume lots of things are linear in life but we're wrong and the practical world does not work that way. One crucial point that I'd like to highlight is the time frame of the data that you're using. Will that impact the results. Will we get better results if we change the time frame. Regards Allwyn
Great article. You can do your Analysis better in Excel by using iXBRLAnalyst Addin for Excel. It adds a new built-in function "SharePrice(ticker, date)".