Stock Market Prediction Using Machine Learning

Prashant Last Updated : 04 Apr, 2025

10 min read

Stock market prediction has been a significant area of research in Machine Learning. Machine learning algorithms such as regression, classifier, and support vector machine (SVM) help predict the stock market. This article presents a simple implementation of analyzing and forecasting Stock market prediction using machine learning. The case study focuses on a popular online retail store, and Random Forest is a powerful tree-based technique for predicting stock prices.

In this article, you will explore stock market prediction using machine learning, discover effective stock prediction models, and learn about an innovative stock market prediction project that leverages advanced algorithms for improved forecasting accuracy.

Learning Objectives

In this tutorial, we will learn about the best ways possible to predict stock prices using a long-short-term memory (LSTM) for time series forecasting.
We will learn everything about stock market prediction using LSTM.

This article was published as a part of the Data Science Blogathon.

What is the Stock Market?
Importance of Stock Market
What is Stock Market Prediction? [Problem Statement]
Stock Market Prediction Using the Long Short-Term Memory Method
Conclusion

What is the Stock Market?

The stock market is the collection of markets where stocks and other securities are bought and sold by investors. Publicly traded companies offer shares of ownership to the public, and those shares can be bought and sold on the stock market. Investors can make money by buying shares of a company at a low price and selling them at a higher price. The stock market is a key component of the global economy, providing businesses with funding for growth and expansion. It is also a popular way for individuals to invest and grow their wealth over time.

Importance of Stock Market

Importance	Description
Capital Formation	It provides a source of capital for companies to raise funds for growth and expansion.
Investment Opportunities	Investors can potentially grow their wealth over time by investing in the stock market.
Economic Indicators	The stock market can indicate the overall health of the economy.
Job Creation	Publicly traded companies often create jobs and contribute to the economy’s growth.
Corporate Governance	Shareholders can hold companies accountable for their actions and decision-making processes.
Risk Management	Investors can use the stock market to manage their investment risk by diversifying their portfolio.
Market Efficiency	The stock market helps allocate resources efficiently by directing investments to companies with promising prospects.

What is Stock Market Prediction? [Problem Statement]

Let us see the data on which we will be working before we begin implementing the software to anticipate stock market values. In this section, we will examine the stock price of Microsoft Corporation (MSFT) as reported by the National Association of Securities Dealers Automated Quotations (NASDAQ). The stock price data will be supplied as a Comma Separated File (.csv) that may be opened and analyzed in Excel or a Spreadsheet.

MSFT’s stocks are listed on NASDAQ, and their value is updated every working day of the stock market. It should be noted that the market does not allow trading on Saturdays and Sundays. Therefore, there is a gap between the two dates. The Opening Value of the stock, the Highest and Lowest values of that stock on the same day, as well as the Closing Value at the end of the day are all indicated for each date. Analyzing this data can be useful for stock market prediction using machine learning techniques.

The Adjusted Close Value reflects the stock’s value after dividends have been declared (too technical!). Furthermore, the total volume of the stocks in the market is provided. With this information, it is up to the job of a Machine Learning/Data Scientist to look at the data and develop different algorithms that may extract patterns from the historical data of the Microsoft Corporation stock.

Stock Market Prediction Using the Long Short-Term Memory Method

We will use the Long Short-Term Memory(LSTM) method to create a Machine Learning model to forecast Microsoft Corporation stock values. They are used to make minor changes to the information by multiplying and adding. Long-term memory (LSTM) is a deep learning artificial recurrent neural network (RNN) architecture.

Unlike traditional feed-forward neural networks, LSTM has feedback connections. It can handle single data points (such as pictures) as well as full data sequences (such as speech or video).

Program Implementation

We will now go to the section where we will utilize Machine Learning techniques in Python to estimate the stock value using the LSTM.

Step 1: Importing the Libraries

As we all know, the first step is to import the libraries required to preprocess Microsoft Corporation stock data and the other libraries required for constructing and visualizing the LSTM model outputs. We’ll be using the Keras library from the TensorFlow framework for this. All modules are imported from the Keras library.

#Importing the Libraries
import pandas as PD
import NumPy as np
%matplotlib inline
import matplotlib. pyplot as plt
import matplotlib
from sklearn. Preprocessing import MinMaxScaler
from Keras. layers import LSTM, Dense, Dropout
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib. dates as mandates
from sklearn. Preprocessing import MinMaxScaler
from sklearn import linear_model
from Keras. Models import Sequential
from Keras. Layers import Dense
import Keras. Backend as K
from Keras. Callbacks import EarlyStopping
from Keras. Optimisers import Adam
from Keras. Models import load_model
from Keras. Layers import LSTM
from Keras. utils.vis_utils import plot_model

Step 2: Getting to Visualising the Stock Market Prediction Data

Using the Pandas Data Reader library, we will upload the stock data from the local system as a Comma Separated Value (.csv) file and save it to a pandas DataFrame. Finally, we will examine the data.

#Get the Dataset
df=pd.read_csv(“MicrosoftStockData.csv”,na_values=[‘null’],index_col=’Date’,parse_dates=True,infer_datetime_format=True)
df.head()

Step 3: Checking for Null Values by Printing the DataFrame Shape

In this step, firstly, we will print the structure of the dataset. We’ll then check for null values in the data frame to ensure that there are none. The existence of null values in the dataset causes issues during training since they function as outliers, creating a wide variance in the training process.

#Print the shape of Dataframe  and Check for Null Values
print(“Dataframe Shape: “, df. shape)
print(“Null Value Present: “, df.IsNull().values.any())
Output:
>> Dataframe Shape: (7334, 6)
>>Null Value Present: False

Date	Open	High	Low	Close	Adj Close	Volume
1990-01-02	0.605903	0.616319	0.598090	0.616319	0.447268	53033600
1990-01-03	0.621528	0.626736	0.614583	0.619792	0.449788	113772800
1990-01-04	0.619792	0.638889	0.616319	0.638021	0.463017	125740800
1990-01-05	0.635417	0.638889	0.621528	0.622396	0.451678	69564800
1990-01-08	0.621528	0.631944	0.614583	0.631944	0.458607	58982400

Step 4: Plotting the True Adjusted Close Value

The Adjusted Close Value is the final output value that will be forecasted using the Machine Learning model. This figure indicates the stock’s closing price on that particular day of stock market trading.

#Plot the True Adj Close Value
df[‘Adj Close’].plot()

Plotting Adj Close Value | Stock Price Prediction

Step 5: Setting the Target Variable and Selecting the Features

The output column is then assigned to the target variable in the following step. It is the adjusted relative value of Microsoft Stock in this situation. Furthermore, we pick the features that serve as the independent variable to the target variable (dependent variable). We choose four characteristics to account for training purposes:

Open
High
Low
Volume

#Set Target Variable
output_var = PD.DataFrame(df[‘Adj Close’])
#Selecting the Features
features = [‘Open’, ‘High’, ‘Low’, ‘Volume’]

Step 6: Scaling

To decrease the computational cost of the data in the table, we will scale the stock values to values between 0 and 1. As a result, all of the data in large numbers is reduced, and therefore memory consumption is decreased. Also, because the data is not spread out in huge values, we can achieve greater precision by scaling down. To perform this, we will be using the MinMaxScaler class of the sci-kit-learn library.

#Scaling
scaler = MinMaxScaler()
feature_transform = scaler.fit_transform(df[features])
feature_transform= pd.DataFrame(columns=features, data=feature_transform, index=df.index)
feature_transform.head()

Date	Open	High	Low	Volume
1990-01-02	0.000129	0.000105	0.000129	0.064837
1990-01-03	0.000265	0.000195	0.000273	0.144673
1990-01-04	0.000249	0.000300	0.000288	0.160404
1990-01-05	0.000386	0.000300	0.000334	0.086566
1990-01-08	0.000265	0.000240	0.000273	0.072656

As shown in the above table, the values of the feature variables are scaled down to lower values when compared to the real values given above.

Step 7: Creating a Training Set and a Test Set for Stock Market Prediction

Before inputting the entire dataset into the training model, we need to partition it into training and test sets. The Machine Learning LSTM model will undergo training using the data in the training set, and its accuracy and backpropagation will be tested against the test set.

To accomplish this, we will employ the TimeSeriesSplit class from the sci-kit-learn library. We will configure the number of splits to be 10, indicating that 10% of the data will serve as the test set, while the remaining 90% will train the LSTM model. The advantage of employing this Time Series split lies in its examination of data samples at regular time intervals.

#Splitting to Training set and Test set
timesplit= TimeSeriesSplit(n_splits=10)
for train_index, test_index in timesplit.split(feature_transform):
        X_train, X_test = feature_transform[:len(train_index)], feature_transform[len(train_index): (len(train_index)+len(test_index))]
        y_train, y_test = output_var[:len(train_index)].values.ravel(), output_var[len(train_index): (len(train_index)+len(test_index))].values.ravel()

Step 8: Data Processing For LSTM

Once the training and test sets are finalized, we will input the data into the LSTM model. Before we can do that, we must transform the training and test set data into a format that the LSTM model can interpret. As the LSTM needs that the data to be provided in the 3D form, we first transform the training and test data to NumPy arrays and then restructure them to match the format (Number of Samples, 1, Number of Features). Now, 6667 are the number of samples in the training set, which is 90% of 7334, and the number of features is 4. Therefore, the training set is reshaped to reflect this (6667, 1, 4). Likewise, the test set is reshaped.

#Process the data for LSTM
trainX =np.array(X_train)
testX =np.array(X_test)
X_train = trainX.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = testX.reshape(X_test.shape[0], 1, X_test.shape[1])

Step 9: Building the LSTM Model for Stock Market Prediction

Finally, we arrive at the point when we construct the LSTM Model. In this step, we’ll build a Sequential Keras model with one LSTM layer. The LSTM layer has 32 units and is followed by one Dense Layer of one neuron.

We compile the model using Adam Optimizer and the Mean Squared Error as the loss function. For an LSTM model, this is the most preferred combination. The model is plotted and presented below.

#Building the LSTM Model
lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation=’relu’, return_sequences=False))
lstm.add(Dense(1))
lstm.compile(loss=’mean_squared_error’, optimizer=’adam’)
plot_model(lstm, show_shapes=True, show_layer_names=True)

Step 10: Training the Stock Market Prediction Model

Finally, we use the fit function to train the LSTM model created above on the training data for 100 epochs with a batch size of 8.

#Model Training
history=lstm.fit(X_train, y_train, epochs=100, batch_size=8, verbose=1, shuffle=False)
Eросh  1/100
834/834  [==============================]  –  3s  2ms/steр  –  lоss:  67.1211
Eросh  2/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  70.4911
Eросh  3/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  48.8155
Eросh  4/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  21.5447
Eросh  5/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  6.1709
Eросh  6/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  1.8726
Eросh  7/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.9380
Eросh  8/100
834/834  [==============================]  –  2s  2ms/steр  –  lоss:  0.6566
Eросh  9/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.5369
Eросh  10/100
834/834  [==============================]  –  2s  2ms/steр  –  lоss:  0.4761
.
.
.
.  
Eросh  95/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.4542
Eросh  96/100
834/834  [==============================]  –  2s  2ms/steр  –  lоss:  0.4553
Eросh  97/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.4565
Eросh  98/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.4576
Eросh  99/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.4588
Eросh  100/100
834/834  [==============================]  –  1s  2ms/steр  –  lоss:  0.4599

Finally, we can observe that the loss value has dropped exponentially over time over the 100-epoch training procedure, reaching a value of 0.4599.

Step 11: Making the LSTM Prediction

Now that we have our model ready, we can use it to forecast the Adjacent Close Value of the Microsoft stock by using a model trained using the LSTM network on the test set. We can accomplish this by employing simple prediction model on the LSTM model

#LSTM Prediction
y_pred= lstm.predict(X_test)

Step 12: Comparing Predicted vs True Adjusted Close Value – LSTM

Finally, now that we’ve projected the values for the test set, we can display the graph to compare both Adj Close’s true values and Adj Close’s predicted value using the LSTM Machine Learning model.

#Predicted vs True Adj Close Value – LSTM
plt.plot(y_test, label=’True Value’)
plt.plot(y_pred, label=’LSTM Value’)
plt.title(“Prediction by LSTM”)
plt.xlabel(‘Time Scale’)
plt.ylabel(‘Scaled USD’)
plt.legend()
plt.show()

The graph above demonstrates that the extremely basic single LSTM network model created above detects some patterns. We may get a more accurate depiction of every specific company’s stock value by fine-tuning many parameters and adding more LSTM layers to the model.

Conclusion

However, with the introduction of Machine Learning and its strong algorithms, the most recent market research and Stock Market Prediction using machine learning advancements have begun to include such approaches in analyzing stock market data. The Opening Value of the stock, the Highest and Lowest values of that stock on the same day, as well as the Closing Value at the end of the day are all indicated for each date. Furthermore, the total volume of the stocks in the market is provided; with this information, it is up to the job of a Machine Learning Data Scientist to look at the data and develop different algorithms that may help in finding appropriate stocks values.

Predicting the stock market was a time-consuming and laborious procedure a few years or even a decade ago. However, with the application of machine learning for stock market forecasts, the procedure has become much simpler. Machine learning not only saves time and resources but also outperforms people in terms of performance. it will always prefer to use a trained computer algorithm since it will advise you based only on facts, numbers, and data and will not factor in emotions or prejudice. It would be interesting to incorporate sentiment analysis on news & social media regarding the stock market in general, as well as a given stock of interest.

Hope you like the article and now have a clear understanding of stock market prediction using machine learning. This innovative approach can enhance accuracy in stock prediction projects, making stock price prediction projects even more effective.

Key Takeaways

Stock Price Prediction using machine learning helps in discovering the future values of a company’s stocks and other assets.
Predicting stock prices helps in gaining significant profits.

Q1. How to make stock price predictions using machine learning?

A. Machine learning plays a significant role in the stock market. We can Predict market fluctuation, study consumer behavior & analyze stock prices.

Q2. Which machine learning algorithm is best for stock prediction?

A. LSTM (Long Short-term Memory) is one of the extremely powerful algorithms for time series. It can catch historical trend patterns & predict future values with high accuracy.

Q3. How to predict the stock market using AI?

A. Anyone can use Ai to perform technical analysis by having a clear understanding of the historical data and trends by noticing patterns & analyzing to determine what can happen to the stock. And can term this phenomenon as prediction, based on which strategies are made to achieve goals.

Q4.What is the AI project for stock price prediction?

AI predicts stock prices using data. Challenging due to market changes. Not guaranteed accurate.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Prashant

Hello, my name is Prashant, and I'm currently pursuing my Bachelor of Technology (B.Tech) degree. I'm in my 3rd year of study, specializing in machine learning, and attending VIT University.

In addition to my academic pursuits, I enjoy traveling, blogging, and sports. I'm also a member of the sports club. I'm constantly looking for opportunities to learn and grow both inside and outside the classroom, and I'm excited about the possibilities that my B.Tech degree can offer me in terms of future career prospects.

Thank you for taking the time to get to know me, and I look forward to engaging with you further!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Xiaoping

Looks like your model cannot predict future's result base on the historic ones, for example using last week's data to predict tomorrow's price, as the features of tomorrow are not available .

Assel

Hi Prashant, Thank you for sharing! COuld you please advise how to obtain predicted price at next day?

Aryaa Money

Thanks for this blog. It is very good to understanding.

Reading list

Introduction

Common Patterns

Validation Techniques

Time Series Forecasting

Exponential Smoothing

ARIMA

Prophet

Deep Learning

Stock Market Prediction Using Machine Learning

Learning Objectives

Table of contents

What is the Stock Market?

Importance of Stock Market

What is Stock Market Prediction? [Problem Statement]

Stock Market Prediction Using the Long Short-Term Memory Method

Step 1: Importing the Libraries

Step 2: Getting to Visualising the Stock Market Prediction Data

Step 3: Checking for Null Values by Printing the DataFrame Shape

Step 4: Plotting the True Adjusted Close Value

Step 5: Setting the Target Variable and Selecting the Features

Step 7: Creating a Training Set and a Test Set for Stock Market Prediction

Step 8: Data Processing For LSTM

Step 9: Building the LSTM Model for Stock Market Prediction

Step 10: Training the Stock Market Prediction Model

Step 11: Making the LSTM Prediction

Step 12: Comparing Predicted vs True Adjusted Close Value – LSTM

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme