Master Generative AI with 10+ Real-world Projects in 2025!

How to Treat Overfitting in Convolutional Neural Networks

Guest Blog Last Updated : 08 Sep, 2020

5 min read

Introduction

Overfitting or high variance in machine learning models occurs when the accuracy of your training dataset, the dataset used to “teach” the model, is greater than your testing accuracy. In terms of ‘loss’, overfitting reveals itself when your model has a low error in the training set and a higher error in the testing set. You can identify this visually by plotting your loss and accuracy metrics and seeing where the performance metrics converge for both datasets.

Overfitting in CNNs - Loss vs. Epoch Plot

Overfitting in CNNs - Accuracy vs. Epoch Plot

Overfitting indicates that your model is too complex for the problem that it is solving, i.e. your model has too many features in the case of regression models and ensemble learning, filters in the case of Convolutional Neural Networks, and layers in the case of overall Deep Learning Models. This causes your model to know the example data well, but perform poorly against any new data.

This is annoying but can be resolved through tuning your hyperparameters, but first, let’s start by making sure our data is divided into well-proportioned sets.

Splitting the Data

For a deep learning model, I recommend having 3 datasets: training, validation, and testing. The validation set should be used to fine-tune your model until you’re satisfied with its performance, then switch to the testing data to train the best version of your model. First, we’ll import the necessary library:

from sklearn.model_selection import train_test_split

Now let’s talk proportions. My ideal ratio is 70/10/20, meaning the training set should be made up of ~70% of your data, then devote 10% to the validation set, and 20% to the test set, like so,

# Create the Validation Dataset
Xtrain, Xval, ytrain, yval = train_test_split(train_images, train_labels_final, train_size=0.9, test_size=0.1, random_state=42)# Create the Test and Final Training Datasets
Xtrain, Xtest, ytrain, ytest = train_test_split(Xtrain, ytrain, train_size=0.78, random_state=42)

You will need to perform two train_test_split() function calls. The first call is done on the initial training set of images and labels to form the validation set. We’ll call the parameters random_state to keep consistency in results when running the function, and test_size to note that we want the size of our validation set to be 10% of the training data, and train_size to set it equal to the remaining percentage of data to be 90%.

This can be omitted by default as python is smart enough to do the math. The variables Xval and yval refer to our validation images and labels. On the second call, we will generate our testing dataset from our newly formed training data Xtrain and ytrain. We will repeat the above, but this time we will set the newest training set to be 78% of the previous and assign the newest dataset to the same variable as the previous for consistency. Meanwhile, we will assign the testing data to Xtest for the test images and test for the label data.

Now we’re ready to begin modeling. Refer to my previous blog to get a deep dive into the initial CNN setup. We will start on the second model assuming our first turned out like the image above. We will use the techniques below:

Regularization
Weight Initialization
Dropout Regularization
Weight Constraints
Other

Regularization

Regularization optimizes a model by penalizing complex models, therefore minimizing loss and complexity. Thus this forces our neural network to be simpler. Here we will use an L2 regularizer, as it is the most common and is more stable than an L1 regularizer. Here we’ll add a regularizer to the second and third layers of our network with a learning rate (lr) of 0.01.

# Hidden Layer 1
model2.add(layers.Conv2D(64, (4, 4), activation=’relu’, kernel_regularizer=regularizers.l2(l=0.01)))
model2.add(layers.MaxPooling2D((2, 2)))# Hidden Layer 2
model2.add(layers.Conv2D(128, (3, 3), activation=’relu’, kernel_regularizer=regularizers.l2(l=0.01)))
model2.add(layers.MaxPooling2D((2,2)))

Weight Initialization

Weight initialization sets up the weights vector for all neurons for the first time before the training process begins. Choosing the correct weights is crucial because we want to get as close as possible to the global minimum of our cost function in an adequate amount of time. In this iteration of our model we will use a He initialization:

# Input Layer of the 3rd Model
model3.add(layers.Conv2D(32, (3, 3), activation=’relu’, kernel_initializer=’he_normal’, input_shape=(96, 96, 3)))

Dropout Regularization

Dropout regularization ignores a random subset of units in a layer while setting their weights to zero during that phase of training.

The ideal rate for the input and hidden layers is 0.4, and the ideal rate for the output layer is 0.2. See below:

random.seed(123) # Establish Consistency in resultsmodel4 = Sequential() # Instantiate the 4th Modelmodel4.add(layers.Conv2D(32, (3, 3), activation=’relu’, input_shape=(96, 96, 3)))
model4.add(layers.MaxPooling2D((2, 2)))
model4.add(Dropout(0.4))model4.add(layers.Conv2D(64, (4, 4), activation=’relu’))
model4.add(layers.MaxPooling2D((2, 2)))
model4.add(Dropout(0.4))# Flattening- Convert 2D matrix to a 1D vector
model4.add(layers.Flatten())
model4.add(layers.Dense(512, activation = ‘relu’))
model4.add(Dropout(0.2))
model4.add(layers.Dense(1, activation=’sigmoid’))

Weight Constraints

A weight constraint checks the size of the network weights and rescales them if the size exceeds a predefined limit. The weight constraint works as required. Below we are using the constraint unit_norm, which forces the weights to have a magnitude of 1.0.

model5.add(layers.Conv2D(32, (3, 3), activation=’relu’, kernel_constraint=unit_norm(), input_shape=(96, 96, 3)))

Other

If all else fails, you can increase the size of your training set by generating more data. Here’s how to do so without altering the images:

datagen = ImageDataGenerator(rotation_range = 0,
                             width_shift_range = 0,
                             height_shift_range = 0,
                             rescale = None,
                             shear_range = 0,
                             zoom_range = 0,
                             horizontal_flip = False,
                             fill_mode = ‘nearest')

Another route is to increase the resolution of all of the photos by increasing the size. You can do this by calling new image data generators for the train, validation, and test datasets. See below how I increased the dimensions of the photos from (96 x 96) to (128 x 128):

# Import the Original Training Dataset
train_gen2 = ImageDataGenerator(rescale=1./255).flow_from_directory(train_dir, target_size=(128,128), batch_size=15200)# Import the Original Validation Dataset
val_gen2 = ImageDataGenerator(rescale=1./255).flow_from_directory(val_dir, target_size=(128,128), batch_size=16)

# Import the Original Testing Dataset
test_gen2 = ImageDataGenerator(rescale=1./255).flow_from_directory(test_dir, target_size=(128,128), batch_size=624)

About the Author

Erica Gabriel

Data Rules Everything Around Me (D.R.E.A.M). Mechanical Engineer & Project Manager turned Data Scientist, using data to build equitable and sustainable solutions to help the systemically oppressed.

Guest Blog

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Write for us

Write, captivate, and earn accolades and rewards for your work

Reach a Global Audience
Get Expert Feedback
Build Your Brand & Audience

Cash In on Your Knowledge
Join a Thriving Community
Level Up Your Data Science Game

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

How to Treat Overfitting in Convolutional Neural Networks

Introduction

Splitting the Data

Regularization

Weight Initialization

Dropout Regularization

Weight Constraints

Other

About the Author

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at