Regularization in Deep Learning with Python Code

shubham.jain Last Updated : 03 Dec, 2024

12 min read

One of the most common challenges encountered by data science professionals is the issue of overfitting. Have you ever experienced a scenario where your machine learning model or regression model excelled on the training data but struggled to make accurate predictions on unseen test data? This is often a trade-off between high bias (underfitting) and high variance (overfitting), impacting the model’s ability to generalize effectively. Overfitting can lead to poor performance on new data, especially in the presence of outliers or noise in the training set. If so, this article addresses your concerns by exploring techniques such as regularization in deep learning and essential concepts of bagging, boosting, and stacking in ensemble learning to improve model generalization.

Avoiding overfitting can single-handedly improve our model’s performance.

In this article, you will delve into the concept of regularization in neural networks, discover different regularization techniques, and understand how these approaches can greatly enhance model performance while preventing overfitting.

Note: This article assumes that you have basic knowledge of neural networks and their implementation in Keras. If not, you can refer to the below articles first–

What is Regularization?
How does Regularization help Reduce Overfitting?
Different Regularization Techniques in Deep Learning
A Case Study on MNIST Data with Keras
Conclusion
Frequently Asked Questions

What is Regularization?

Regularization is a technique used in machine learning and deep learning to prevent overfitting and improve a model’s generalization performance. It involves adding a penalty term to the loss function during training.

This penalty discourages the model from becoming too complex or having large parameter values, which helps in controlling the model’s ability to fit noise in the training data. Regularization in deep learning methods includes L1 and L2 regularization, dropout, early stopping, and more. By applying regularization for deep learning, models become more robust and better at making accurate predictions on unseen data.

Also Read: Regularization in Machine Learning

Before we deep dive into the topic, take a look at this image:

Have you seen this image before? As we move towards the right in this image, our model tries to learn too well the details and the noise from the training data, ultimately resulting in poor performance on the unseen data.

In other words, while going toward the right, the complexity of the model increases such that the training error reduces but the testing error doesn’t. This is shown in the image below:

If you’ve built a neural network before, you know how complex they are. This makes them more prone to overfitting.

Regularization is a technique that modifies the learning algorithm slightly so that the model generalizes better. This, in turn, improves the model’s performance on unseen data as well.

How does Regularization help Reduce Overfitting?

Let’s consider a neural network that is overfitting on the training data as shown in the image below:

If you have studied the concept of regularization in machine learning, you will have a fair idea that regularization penalizes the coefficients. In deep learning, it penalizes the weight matrices of the nodes.

Assume that our regularization coefficient is so high that some of the weight matrices are nearly equal to zero.

This will result in a much simpler linear network and slight underfitting of the training data.

Such a large value of the regularization coefficient is not that useful. We need to optimize the value of the regularization coefficient to obtain a well-fitted model as shown in the image below:

Also Read: How to avoid Over-fitting using Regularization?

Different Regularization Techniques in Deep Learning

Now that we understand how regularization helps reduce overfitting, we’ll learn a few different techniques for applying regularization in deep learning.

L2&L1 Regularization

L1 and L2 are the most common types of regularization deep learning. These update the general cost function by adding another term known as the regularization term.

Cost function = Loss (say, binary cross entropy) + Regularization term

Due to the addition of this regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.

However, this regularization term differs in L1 and L2.

For L2:

For L1:

In L2, we have: ||w||^2 = Σ w_i^2. This is known as ridge regression, where lambda is the regularization parameter. It is the hyperparameter whose value is optimized for better results. L2 regularization is also known as weight decay as it forces the weights to decay towards zero (but not exactly zero).

In L1, we having: ||w||=Σ |w_i|. In this, we penalize the absolute value of the weights. Unlike L2, the weights may be reduced to zero here. L1 regularization is also called lasso regression. Hence, it is very useful when we are trying to compress our model. Otherwise, we usually prefer L2 over it.

In keras, we can directly apply regularization for deep learning to any layer using the regularizers.

Below is the sample code to apply L2 regularization to a Dense layer:

from keras import regularizers

model.add(Dense(64, input_dim=64,
                kernel_regularizer=regularizers.l2(0.01)))

Note: Here the value 0.01 is the value of regularization parameter, i.e., lambda, which we need to optimize further. We can optimize it using the grid-search method.

Similarly, we can also apply L1 regularization deep learning. We will look at this in more detail in a case study later in this article.

Dropout

This is one of the most interesting types of regularization techniques. It also produces very good results and is consequently the most frequently used regularization technique in the field of deep learning.

To understand dropout, let’s say our neural network structure is akin to the one shown below:

So what does dropout do? At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections as shown below:

Each iteration has a different set of nodes, which results in a different set of outputs. This can also be thought of as an ensemble technique in machine learning.

Ensemble models usually perform better than a single model as they capture more randomness. Similarly, dropout models also perform better than normal neural network models.

This probability of choosing how many nodes should be dropped is the hyperparameter of the dropout function. As seen in the image above, dropout can be applied to both the hidden layers as well as the input layers.

nodes , regularization in deep learning — Source: chatbotslife

Due to these reasons, dropout is usually preferred when we have a large neural network structure to introduce more randomness.

In Keras, we can implement dropout using the Keras core layer. Below is the Python code for it:

from keras.layers.core import Dropout

model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

As you can see, we have defined 0.25 as the probability of dropping. We can tune it further for better results using the grid search method.

Data Augmentation

The simplest way to reduce overfitting is to increase the training data size. In machine learning, however, increasing the training data size was impossible as the labeled data was too costly.

But now, let’s consider we are dealing with images. In this case, there are a few ways of increasing the size of the training data—rotating the image, flipping, scaling, shifting, etc. In the image below, some transformation has been done on the handwritten digits dataset.

Data Augmentation , regularization in deep learning

This technique is known as data augmentation. It usually provides a big leap in improving the accuracy of the model, and it can be considered a mandatory trick to improve our predictions.

In keras, we can perform all of these transformations using ImageDataGenerator. It has a big list of arguments that you can use to pre-process your training data.

Below is the sample code to implement it:

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(horizontal flip=True)
datagen.fit(train)

Early Stopping

Early stopping is a cross-validation strategy in which we keep one part of the training set as the validation set. When we see that the performance on the validation set is getting worse, we immediately stop the training on the model.

In the above image, we will stop training at the dotted line since, after that, our model will start overfitting on the training data.

In keras, we can apply early stopping using the callbacks function. Below is the sample code for it.

from keras.callbacks import EarlyStopping

EarlyStopping(monitor='val_err', patience=5)

Here, monitor refers to the quantity that you need to keep track of, and ‘val_err’ refers to the validation error.

Patience denotes the number of epochs with no further improvement, after which training stops. For a better understanding, let’s look at the above image again. After the dotted line, each epoch will result in a higher validation error value. Therefore, our model will stop 5 epochs after the dotted line (since our patience equals 5) because it sees no further improvement

Note: After 5 epochs (the value generally defined for patience), the model might start improving again, and the validation error may decrease. Therefore, we need to take extra care while tuning this hyperparameter.

Also Read: Prevent Overfitting Using Regularization Techniques

A Case Study on MNIST Data with Keras

By this point, you should theoretically understand the different techniques we have gone through. We will now apply this knowledge to our deep learning practice problem – Identify the digits. Once you have downloaded the dataset, start following the below code! First, we’ll import some of the basic libraries.

%pylab inline

import numpy as np
import pandas as pd
from scipy.misc import imread
from sklearn.metrics import accuracy_score

from matplotlib import pyplot

import tensorflow as tf
import keras

# To stop potential randomness
seed = 128
rng = np.random.RandomState(seed)

Now, let’s load the dataset.

root_dir = os.path.abspath('/Users/shubhamjain/Downloads/AV/identify the digits/')
data_dir = os.path.join(root_dir, 'data')
sub_dir = os.path.join(root_dir, 'sub')

## reading train file only
train = pd.read_csv(os.path.join(data_dir, 'Train', 'train.csv'))
train.head()

Output:

Take a look at some of our images now.

img_name = rng.choice(train.filename)

filepath = os.path.join(data_dir, 'Train', 'Images', 'train', img_name)

img = imread(filepath, flatten=True)

pylab.imshow(img, cmap='gray')
pylab.axis('off')
pylab.show()

#storing images in numpy arrays
temp = []
for img_name in train.filename:
 image_path = os.path.join(data_dir, 'Train', 'Images', 'train', img_name)
 img = imread(image_path, flatten=True)
 img = img.astype('float32')
 temp.append(img)
 
x_train = np.stack(temp)

x_train /= 255.0
x_train = x_train.reshape(-1, 784).astype('float32')

y_train = keras.utils.np_utils.to_categorical(train.label.values)

Create a validation dataset to optimize our model for better scores. We will use a 70:30 train-validation dataset ratio.

split_size = int(x_train.shape[0]*0.7)

x_train, x_test = x_train[:split_size], x_train[split_size:]
y_train, y_test = y_train[:split_size], y_train[split_size:]

First, let’s build a simple neural network with 5 hidden layers, each with 500 nodes.

# import keras modules
from keras.models import Sequential
from keras.layers import Dense

# define vars
input_num_units = 784
hidden1_num_units = 500
hidden2_num_units = 500
hidden3_num_units = 500
hidden4_num_units = 500
hidden5_num_units = 500
output_num_units = 10

epochs = 10
batch_size = 128

model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu'),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu'),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu'),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu'),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

Also Read: Train-Test-Validation Split in 2024

Evaluating the Model

Note that we are just running it for 10 epochs. Let’s quickly check the performance of our model.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

Output:

Evaluating the Model | regularization in deep learning

Now, let’s try the L2 regularizer over it and check whether it gives better results than a simple neural network model.

from keras import regularizers

model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu',
 kernel_regularizer=regularizers.l2(0.0001)),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu',
 kernel_regularizer=regularizers.l2(0.0001)),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu',
 kernel_regularizer=regularizers.l2(0.0001)),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu',
 kernel_regularizer=regularizers.l2(0.0001)),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu',
 kernel_regularizer=regularizers.l2(0.0001)),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

Output:

Note that the value of lambda is equal to 0.0001. Great! We just obtained an accuracy that is greater than our previous NN model.

Now, let’s try the L1 regularization technique.

## l1

model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu',
 kernel_regularizer=regularizers.l1(0.0001)),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu',
 kernel_regularizer=regularizers.l1(0.0001)),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu',
 kernel_regularizer=regularizers.l1(0.0001)),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu',
 kernel_regularizer=regularizers.l1(0.0001)),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu',
 kernel_regularizer=regularizers.l1(0.0001)),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

Output:

This doesn’t show any improvement over the previous model. Let’s jump to the dropout technique.

## dropout

from keras.layers.core import Dropout
model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu'),
 Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

## dropout

from keras.layers.core import Dropout
model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu'),
 Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

Output:

Not bad! Dropout also gives us a little improvement over our simple NN model.

Now, let’s try data augmentation.

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(zca_whitening=True)

# loading data
train = pd.read_csv(os.path.join(data_dir, 'Train', 'train.csv'))

temp = []
for img_name in train.filename:
 image_path = os.path.join(data_dir, 'Train', 'Images', 'train', img_name)
 img = imread(image_path, flatten=True)
 img = img.astype('float32')
 temp.append(img)
 
x_train = np.stack(temp)

X_train = x_train.reshape(x_train.shape[0], 1, 28, 28)

X_train = X_train.astype('float32')

Now, fit the training data in order to augment.

# fit parameters from data
datagen.fit(X_train)

Here, I have used zca_whitening as the argument, which highlights the outline of each digit as shown in the image below:

## splitting
y_train = keras.utils.np_utils.to_categorical(train.label.values)
split_size = int(x_train.shape[0]*0.7)

x_train, x_test = X_train[:split_size], X_train[split_size:]
y_train, y_test = y_train[:split_size], y_train[split_size:]

## reshaping
x_train=np.reshape(x_train,(x_train.shape[0],-1))/255
x_test=np.reshape(x_test,(x_test.shape[0],-1))/255

## structure using dropout
from keras.layers.core import Dropout
model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden2_num_units, input_dim=hidden1_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden3_num_units, input_dim=hidden2_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden4_num_units, input_dim=hidden3_num_units, activation='relu'),
 Dropout(0.25),
 Dense(output_dim=hidden5_num_units, input_dim=hidden4_num_units, activation='relu'),
 Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

Output:

Wow! We made a big leap in the accuracy score. The good thing is that it works every time. We just need to select a proper argument based on the images in our dataset.

Now, let’s try our final technique – early stopping.

from keras.callbacks import EarlyStopping

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
trained_model_5d = model.fit(x_train, y_train, nb_epoch=epochs, batch_size=batch_size, validation_data=(x_test, y_test)
 , callbacks = [EarlyStopping(monitor='val_acc', patience=2)])

Output:

Our model stops after only 5 iterations as the validation accuracy does not improve. It gives good results in cases where we run it for a larger number of epochs, so it’s a technique to optimize the number of epochs.

Also Read: Techniques To Prevent Overfitting In Neural Networks

Conclusion

I hope you now understand regularization deep learning techniques in deep learning and the different methods required to implement them, not just for linear regression models but also for complex neural networks like CNN. Regularization is crucial when training deep learning models on large datasets to prevent overfitting and improve generalization. You can effectively employ techniques like L1/L2 regularization, dropout, data augmentation, and early stopping using gradient descent optimization and appropriate tuning of hyperparameters like the learning rate.

Elastic net regularization, which combines L1 and L2 penalties, is another powerful approach worth exploring. I highly recommend applying these regularization deep learning methods when dealing with a deep learning task, as they will help you expand your horizons, better understand the topic, and build more robust machine learning algorithms.

Hope you like the article and better understand regularization in deep learning. Regularisation techniques, such as L1 and L2 regularization, dropout, and data augmentation, prevent overfitting, and improve model performance.

Frequently Asked Questions

Q1. What is regularization in deep learning?

A. Regularization in deep learning is a technique used to prevent overfitting and improve neural network generalization. It involves adding a regularization term to the loss function, which penalizes large weights or complex model architectures. Regularization methods such as L1 and L2 regularization, dropout, and batch normalization help control model complexity and improve neural network generalization to unseen data.

Q2. What is L1 regularization and L2 regularization?

A. L1 regularization, also known as Lasso regularization, is a method in deep learning that adds the sum of absolute values of the weights to the loss function. It encourages sparsity by driving some weights to zero, resulting in feature selection. L2 regularization, also called Ridge regularization, adds the sum of squared weights to the loss function, promoting smaller but non-zero weights and preventing extreme values.

Q3. What is dropout in neural network?

A. Dropout serves as a regularization technique in neural networks to prevent overfitting. During training, the model randomly “drops out” a subset of neurons by setting their outputs to zero with a certain probability. This forces the network to learn more robust and independent features, as it cannot rely on specific neurons. Dropout improves generalization and reduces the risk of overfitting.

Q4.What is L1 and L2 regularization technique?

L1 and L2 regularization prevent overfitting by adding penalties to model complexity:
L1 (Lasso): Reduces some coefficients to zero, helping with feature selection.
L2 (Ridge): Shrinks all coefficients but keeps them non-zero, reducing their impact without eliminating features.

shubham.jain

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Pramod

Thanks a lot bro. Immensely helpful.

Chetan

Could we achieve the same validation accuracy with much lesser architecture ? If so, then our Hyperparameters (especially the L2 lambda) must be quite high (say 0.05). Isn't it ? Or, is there any other way of combating this ??

Florido Meacci

Thank you for the great article. I'm trying to differentiate between different font weigths of the same font and have been hitting a wall. Was wondering if you could share some insights and if you have had experience with this? What would you recommended as for the best settings, and how big of a dataset would you recommend.

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Regularization in Deep Learning with Python Code

Table of contents

What is Regularization?

How does Regularization help Reduce Overfitting?

Different Regularization Techniques in Deep Learning

L2&L1 Regularization

Dropout

Data Augmentation

Early Stopping

A Case Study on MNIST Data with Keras

Evaluating the Model

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp