Estimation of Neurons and Forward Propagation in Neural Network

Neha Last Updated : 14 Jan, 2025

12 min read

Neural networks are a class of machine learning algorithms inspired by the structure and functioning of the human brain. A neural network consists of interconnected nodes, also known as neurons, that work together to solve complex problems. The number of neurons used in a neural network can significantly impact its performance and accuracy. In this article, we’ll explore neural networks. We’ll cover how to estimate the ideal number of neurons. You’ll also learn how forward propagation helps make predictions. By the end, you’ll understand neural networks work better. You’ll know how to optimize them for your needs. So, let’s dive into Estimation of neurons and forward propagation in neural networks.

What is Estimation of Neurons?
Estimation of Neurons or Nodes
Fully Connected Network (FNN)
Steps to Perform Neural Network
How to Calculate the Output for a Neural Network?
Estimate to Reach the Output
Squashing the Neural Net
Forward Propagation
Final Output
Conclusion

What is Estimation of Neurons?

In the context of neural networks, estimating neurons refers to determining the optimal number of neurons to use in each network layer. This is an important step in designing and training neural networks, as the number of neurons can significantly impact the network’s performance. Few neurons can result in underfitting, where the model cannot capture the complexity of the data. At the same time, too many neurons can result in overfitting, where the model fits the training data too closely and performs poorly on new data. Various methods estimate the optimal number of neurons, including trial and error, cross-validation, and more advanced techniques such as pruning.

Estimation of Neurons or Nodes

Let’s start with a binary classification problem where we want to classify whether the customer will churn or not churn. We will use a small dummy data for our understanding purpose with four input variables and eight observations.

It has the following neural net with an architecture of [4, 5, 3, 2] and is depicted below:

4 independent variables or the Xs in the input layer, L1
5 neurons in the first hidden layer, L2
3 neurons in the second hidden layer, L3, and
2 in the output layer L4 with two nodes, Q₁ and Q₂.

Fully Connected Network (FNN)

Let’s label the neurons in our hidden layers for reference. In Hidden Layer L2, we’ll call them N1 through N5. In Hidden Layer L3, they’ll be N6, N7, and N8. The output layer in a classification problem can be structured in two ways. It can either have a single node, or it can have one node for each class or category.

This network here is called Fully Connected Network (FNN) or Dense Network since every neuron has a connection with the node of the previous layer output. It is also known as the Feedforward Neural Network or Sequential Neural Network.

The equation for the neural network is a linear combination of the independent variables and their respective weights and bias (or the intercept) term for each neuron. The neural network equation looks like this:

Z = Bias + W₁X₁ + W₂X₂ + …+ W_nX_n

where,

Z is the symbol for denotation of the above graphical representation of ANN.
Wis, are the weights or the beta coefficients
Xis, are the independent variables or the inputs, and
Bias or intercept = W₀

Steps to Perform Neural Network

There are three steps to perform in any neural network:

We take the input variables and the above linear combination equation of Z = W₀ + W₁X₁ + W₂X₂ + …+ W_nX_n to compute the output or the predicted Y values, called the Y_pred.
Calculate the loss or the error term. The error term is the deviation of the actual values from the predicted values.
Minimize the loss function or the error term.

How to Calculate the Output for a Neural Network?

Firstly, we will understand how to calculate the output for a neural network and then will see the approaches that can help to converge to the optimum solution of the minimum error term.

The output layer receives information from hidden layer L3, which connects to hidden layer 2 and ultimately the input variables. The hidden layers automatically create features without requiring manual derivation. This automatic feature generation is what distinguishes deep learning from traditional machine learning.”

I have broken down the long sentences into shorter ones while preserving the core concepts about how deep learning networks process information through their layers and automatically generate features.

So, to compute the output, we will have to calculate for all the nodes in the previous layers. Let us understand what is the mathematical explanation behind any kind of neural nets.

Now, as from the above architecture, we can see that each neuron cannot have the same general equation for the output as the above one. We will have one such equation per neuron both for the hidden and the output layer.

The nodes in the hidden layer L2 are dependent on the Xs present in the input layer therefore, the equation will be the following:

N₁ = W₁₁*X₁+ W₁₂*X₂ + W₁₃*X₃ + W₁₄*X₄ + W₁₀
N₂ = W₂₁*X₁+ W₂₂*X₂ + W₂₃*X₃ + W₂₄*X₄ + W₂₀
N₃ = W₃₁*X₁+ W₃₂*X₂ + W₃₃*X₃ + W₃₄*X₄ + W₃₀
N₄ = W₄₁*X₁+ W₄₂*X₂ + W₄₃*X₃ + W₄₄*X₄ + W₄₀
N₅ = W₅₁*X₁+ W₅₂*X₂ + W₅₃*X₃ + W₅₄*X₄ + W₅₀

Similarly, the nodes in the hidden layer L3 are derived from the neurons in the previous hidden layer L2, hence their respective equations will be:

N₅ = W₅₁ * N₁ + W₅₂ * N₂ + W₅₃ * N₃ + W₅₄ * N₄ + W₅₅ * N₅ + W₅₀
N₆ = W₆₁ * N₁ + W₆₂ * N₂ + W₆₃ * N₃ + W₆₄ * N₄ + W₆₅ * N₅ + W₆₀
N₇ = W₇₁ * N₁ + W₇₂ * N₂ + W₇₃ * N₃ + W₇₄ * N₄ + W₇₅ * N₅ + W₇₀

The output layer nodes are coming from the hidden layer L3 which makes the equations as:

O₁ = WO₁₁ * N₅ + WO₁₂ * N₆ + WO₁₃ * N₇ + WO₁₀
O₂ = WO₂₁ * N₅ + WO₂₂ * N₆ + WO₂₃ * N₇ + WO₂₀

Estimate to Reach the Output

Now, how many weights or betas will be needed to estimate to reach the output? On counting all the weights Wis in the above equation will get 51. However, no real model will have only three input variables to start with!

Additionally, the Estimation of neurons and the hidden layers themselves are the tuning parameters so in that case, how will we know how many weights to estimate to calculate the output? Is there an efficient way than the manual counting approach to know the number of weights needed? The weights here are referred to the beta coefficients of the input variables along with the bias term as well (and the same will be followed in the rest of the article).

The structure of the network is 4,5,3,2. The hidden layer L2 has 25 weights in total. This comes from (4 + 1) * 5, where 4 is the number of input variables in L1, and 5 is the number of neurons in L2. Each input X has one bias term. This gives us 5 bias terms total, represented as (4 + 1).

The weight count for each layer follows a specific formula. Take the number of nodes from the previous layer, add their bias terms. Then multiply this sum by the number of neurons in the next layer.

Similarly, the number of weight for the hidden layer L3 = (5 + 1) * 3 = 18 weights, and for the output layer the number of weights = (3 + 1) * 2 = 8.

The total number of weights for this neural network is the sum of the weights from each of the individual layers which is = 25 + 18 + 8 = 51

We now know how many weights will we have in each layer and these weights from the above neuron equations can be represented in the matrix form as well. Each of the weights of the layers will take the following form:

Hidden Layer L2 will have a 5 * 5 matrix as seen the number of weights is (4 + 1) * 5:

N₁ = W₁₁*X₁+ W₁₂*X₂ + W₁₃*X₃ + W₁₄*X₄ + W₁₀
N₂ = W₂₁*X₁+ W₂₂*X₂ + W₂₃*X₃ + W₂₄*X₄ + W₂₀
N₃ = W₃₁*X₁+ W₃₂*X₂ + W₃₃*X₃ + W₃₄*X₄ + W₃₀
N₄ = W₄₁*X₁+ W₄₂*X₂ + W₄₃*X₃ + W₄₄*X₄ + W₄₀
N₅ = W₅₁*X₁+ W₅₂*X₂ + W₅₃*X₃ + W₅₄*X₄ + W₅₀

A 3*6 matrix for the hidden layer L3 having the number of weights as (5 + 1) * 3 = 18

N₅ = W₅₁ * N₁ + W₅₂ * N₂ + W₅₃ * N₃ + W₅₄ * N₄ + W₅₅ * N₅ + W₅₀
N₆ = W₆₁ * N₁ + W₆₂ * N₂ + W₆₃ * N₃ + W₆₄ * N₄ + W₆₅ * N₅ + W₆₀
N₇ = W₇₁ * N₁ + W₇₂ * N₂ + W₇₃ * N₃ + W₇₄ * N₄ + W₇₅ * N₅ + W₇₀

Lastly, the output layer would be 4*2 matrix with (3 + 1) * 2 number of weights:

O₁ = WO₁₁ * N₅ + WO₁₂ * N₆ + WO₁₃ * N₇ + WO₁₀
O₂ = WO₂₁ * N₅ + WO₂₂ * N₆ + WO₂₃ * N₇ + WO₂₀

nd how to optimize their performance for your specific use case.

Okay, so now we know how many weights we need to compute for the output but then how do we calculate the weights? In the first iteration, we assign randomized values between 0 and 1 to the weights. In the following iterations, these weights are adjusted to converge at the optimal minimized error term.

We are so persistent about minimizing the error because the error tells how much our model deviates from the actual observed values. Therefore, to improve the predictions, we constantly update the weights so that loss or error is minimized.

This adjustment of weights is also called the correction of the weights. There are two methods: Forward Propagation and Backward Propagation to correct the betas or the weights to reach the convergence. We will go into the depth of each of these techniques; however, before that lets’ close the loop of what the neural net does after estimating the betas.

Squashing the Neural Net

The next step on the ladder of computation of output is to apply a transformation on these linear equations. As we have a neural net related to classification at hand, how will this linear equation apply when categorizing the output into classes?

For a binary classification problem, we need Sigmoid to transform the linear equation into a nonlinear equation. In case you are not sure why we use Sigmoid to transform a linear equation to a nonlinear equation, then would suggest refreshing the logistic regression.

For a particular node, the transformation is as follows:

N₁ = W₁₁*X₁+ W₁₂*X₂ + W₁₃*X₃ + W₁₄*X₄ + W₁₀

After implementing the Sigmoid transformation, it becomes:

h₁ = sigmoid(N₁)

where,

sigmoid(N₁) = exp^{(W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10)}/(1+ exp^{(W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10)})

This alteration applies to the hidden layers and output layers and is known as the Activation or Squashing Function. This transformation adds non-linearity to the network because every business problem may not be solved linearly.

Squashing the Neural Net — Source: media-exp1.licdn.com

There are various types of activation functions available and each function has a different utilization. On the output layer, the activation function is dependent on the type of business problem. The squashing function for the output layer for binary classification is the Sigmoid.

Hence, to find the output we estimate the weights and perform the mathematical transformation. The output of a node is the outcome of this activation function.

Till this point, we have just completed step 1 of the neural network that is taking the input variables and finding the output. Then we calculate the error term. And mind you, right now this is only done for one record! We perform this entire cycle all over again for all the records!

Relax, we don’t have to do this manually. This is just the process, the network does these steps in its background. The idea here is to know how the network works, we don’t have to do it manually.

In the neural network, we can move from left to right and right to left as well. The right to left process of adjusting the weights from the Output to the Input layer is Backward Propagation (I will cover this in the next article).

Forward Propagation

The process of going from left to right i.e from the Input layer to the Output Layer is Forward Propagation. We move from left to right to adjust or correct the weights. We will understand how this mathematically works and update the weights to have the minimized loss function.

Our binary classification dataset had input X as 4 * 8 matrix with 4 input variables and 8 records and the Y variable is 2 * 8 matrix with two columns, for class 1 and 0, respectively with 8 records. It had some categorical variables post converting it to dummy variables, we have the set as below:

We begin with a 48 input matrix and aim for a 28 output. The number of hidden layers and neurons in each layer are hyperparameters. These values are defined by the user. How we achieve the output is via matrix multiplication between the input variables and the weights of each layer.

We have seen above that the weights will have a matrix for each of the respective layers. Let’s begin with an input matrix of 4 * 8. We multiply this by the weight matrix between L1 and L2 layers. This gives us the matrix for layer L3. We repeat these matrix multiplications through each layer until we reach the final 2 * 8 output layer.

Forward Propagation 2 estimation of neurons

Note: that the above explanation of neuron estimation applies to a single observation. The network repeats this process for all observations.

Final Output

Now, let’s break down the steps to understand how the matrix multiplication in Forward propagation works:

First, the input matrix is 4 * 8, and the weight matrix between L1 and L2, referring to it as W_h1 is 5 * 5 (we saw this above).
The W_h1= 5* 5 weight matrix, includes both for the betas or the coefficients and for the bias term.
For simplification, breaking the wh1 into beta weights and the bias (going forward will use this nomenclature). So the beta weights between L1 and L2 are of 4*5 dimension (as have 4 input variables in L1 and 5 neurons in the Hidden Layer L2).
For understanding purpose, will illustrate the multiplication for one layer:

We can multiply element by element but that result will be only for one observation or one record. To get the result for all the 8 observations in one go, we need to multiply the two matrices.

For matrix multiplication, the first matrix’s columns must match the second matrix’s rows. Our input matrix has 8 columns, but the weight matrix has 4 rows, so we cannot multiply them.

So, what do we do? We take the transpose of one of the matrices to conduct the multiplication. Transposing the weight matrix to 5 * 4 will help us resolve this issue.

So, now after adding the bias term, the result between the input layer and the hidden layer L2, becomes Z₁ = W_h1^T * X + b_h1.

5. The next step is to apply the activation function on Z₁. Note, the shape of Z₁ does not change by applying the activation function so h₁ = activation function(Z₁) is of shape 5*8.

6. In a similar manner to the above five steps, the network using the forward propagation gets the outcome of each layer:

Note: that for the next layer between L2 and L3, the input this time will not be X but will be h_1, which results from L1 and L2.

Z₂ = W_h2^T * h₁ + b_h2,

where ,

W_h2 is the weight matrix between L2 and L3 with a shape of 5*3
W_h2^T , is the transpose of W_h2, having the dimension of 3*5
h₁ is the result of L1 and L2, with a shape of 5*8, and
b_h2 is the bias term.

So, Z₂ = W_h2^T * h₁ + b_h2 with its matrix multiplication is:

Z₂ = [(3*5) * (5*8)] + b_h2 will result Z₂ with dimension of 3*8 and post this again apply the activation function, which results in: h₂ = activation function(Z₂) is of shape 3*8.

7. We repeat these steps for the computation of the last layer.

This time for the next layer between L3 and L4, the input will be h₂, resulting from L2 and L3.

Z₃ = W_h0^T * h₂ + b_h0,

Where W_h0 is the weight matrix between L3 and L4 with a shape of 3*2
W_h0^T, is the transpose of W_h0, having the dimension of 2*3
h₂ is the result of L2 and L3, with a shape of 3*8, and
b_h0 is the bias term.

So, Z₃ = W_h0^T * h₂ + b_h0, with its matrix multiplication is:

Z₃ = [(2*3) * (3*8)] + b_h0 will result in Z₃ with the dimension of 2*8 and post this again apply the activation function, this time use Sigmoid to transform as need to get the output, which results in O = Sigmoid(Z₃) is of shape 2*8.

After estimating the output through forward propagation, we calculate the error. The process of adjusting weights to minimize this error continues until we find the optimal solution.

The other, preferred method to adjust weights is Backward Propagation, which we will explore in the next article.

Conclusion

The estimation of neurons and forward propagation are fundamental concepts in neural networks. Estimating the required number of neurons in a neural network is crucial to prevent overfitting and underfitting, which can harm performance. Forward propagation is the process of moving data through the neural network, allowing it to make predictions. This article has only introduced these concepts briefly. There is much more to explore in the field of neural networks.

Unlock your potential in neural networks with our Blackbelt program! Gain hands-on skills to build and deploy advanced AI models. Join now and start your journey to becoming a data expert!

Frequently Asked Questions

Q1. What is the cost formula in neural network?

A.The cost formula in a neural network, also called the loss function, measures the difference between predicted outputs and target values during training. It quantifies the network’s performance and aids in adjusting the model’s parameters to minimize errors. Common examples include mean squared error for regression tasks and cross-entropy for classification tasks.

Q2. What is the formula for deep network calculation?

A. The formula for deep network calculation involves sequentially computing the output of each layer. The process begins with input data fed into the first layer. Activation functions are then applied to the weighted sum of inputs at each layer. This continues layer by layer until the output layer is reached. Finally, the output layer provides the prediction or result of the computation by the deep neural network.

Analytics Vidhya does not own the media shown in this article, and the author uses it at their discretion.

Neha

Hi there! I am Neha Seth. I work as a Data Scientist in Larsen & Toubro Infotech (LTI). I hold a Postgraduate Program in Data Science & Engineering from the Great Lakes Institute of Management and a Bachelors in Statistics. I have been featured as Top 10 Most Popular Guest Authors in 2020 on Analytics Vidhya (AV).

My area of interest lies in NLP and Deep Learning. I have also passed the CFA Program. You can reach out to me on LinkedIn and can read my other blogs for AV.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

shiwani sheetal

Going great 👍

Atul Sharma

This article is helpful to understand how the nodes are estimated, how a neural network operates, its parameters, and the working of the forward propagation method. Nice

Alisher

A lot of thanks, Neha Seth! This article is written in a very clear and fluent language. I wish you further development in your creativity.

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Estimation of Neurons and Forward Propagation in Neural Network

Table of contents

What is Estimation of Neurons?

Estimation of Neurons or Nodes

Fully Connected Network (FNN)

Steps to Perform Neural Network

How to Calculate the Output for a Neural Network?

Estimate to Reach the Output

Squashing the Neural Net

Forward Propagation

Final Output

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp