Basics of CNN in Deep Learning

Debasish Kalita Last Updated : 08 Jan, 2025

8 min read

In this article, we will look into the fundamental principles and components that constitute the bedrock of CNNs. In this article, we unravel the intricate layers of neural networks shaping the future of artificial intelligence. Understanding the basics of CNN is not just a step; it’s a leap into deep learning, where the transformative power of Convolutional Neural Networks (CNNs) takes center stage. Join us as we demystify the workings of CNNs, exploring their architecture, operations, and profound impact on reshaping the landscape of deep learning. Whether you’re a novice eager to grasp the essentials or a seasoned practitioner looking to deepen your knowledge, this exploration of the Basics of CNN in Deep Learning promises to enlighten and inspire.

This article was published as a part of the Data Science Blogathon

What is Convolutional Neural Network?
Convolutional Layer
Padding and Stride
Pooling
ReLU
Basic Python Implementation
Conclusion
Frequently Asked Questions

What is Convolutional Neural Network?

Convolutional Neural Networks also known as CNNs or ConvNets, are a type of feed-forward artificial neural network whose connectivity structure is inspired by the organization of the animal visual cortex. Small clusters of cells in the visual cortex are sensitive to certain areas of the visual field. Individual neuronal cells in the brain respond or fire only when certain orientations of edges are present. Some neurons activate when shown vertical edges, while others fire when shown horizontal or diagonal edges. A convolutional neural network is a type of artificial neural network used in deep learning to evaluate visual information. These networks can handle a wide range of tasks involving images, sounds, texts, videos, and other media. Professor Yann LeCunn of Bell Labs created the first successful convolution networks in the late 1990s.

Convolution Neural Network | Basics of CNN

Convolutional Neural Networks (CNNs) have an input layer, an output layer, numerous hidden layers, and millions of parameters, allowing them to learn complicated objects and patterns. It uses Convolution and pooling processes sub-sample the given input before applying an activation function. All layers consist of hidden neurons that connect partially, with a completely connected layer at the end producing the output layer. The output shape is similar to the size of the input image.

Convolution is the process of combining two functions to produce the output of the other function. The input image is convoluted with the application of filters in CNNs, resulting in a Feature map. Filters are weights and biases that are randomly generated vectors in the network. Instead of having individual weights and biases for each neuron, CNN uses the same weights and biases for all neurons. Many filters can be created, each of which catches a different aspect from the input. Kernels are another name for filters.

Convolutional Layer

In convolutional neural networks (CNNs), the primary components are convolutional layers. These layers typically involve input vectors, such as an image, filters (or feature detectors), and output vectors, often referred to as feature maps. As the input, such as an image, traverses through a convolutional layer, it undergoes abstraction into a feature map, also known as an activation map. This process involves the convolution operation, which enables the detection of more complex features within the image.

Additionally, Rectified linear units (ReLU) commonly serve as activation functions within these layers to introduce non-linearity into the network. Furthermore, CNNs often employ pooling operations to reduce the spatial dimensions of the feature maps, leading to a more manageable output volume. Overall, convolutional layers play a crucial role in extracting meaningful features from the input data, making them fundamental in tasks such as image classification and natural language processing, among others, within the realm of machine learning models.

Feature Map = Input Image x Feature Detector

Convolutional layers convolve the input and pass the output to the next layer. This is analogous to a neuron’s response to a single stimulus in the visual cortex. Each convolutional neuron processes data only for its assigned receptive field.

A convolution is a grouping function in mathematics. Convolution occurs in CNNs when two matrices (rectangular arrays of numbers arranged in columns and rows) combine to generate a third matrix.

In the convolutional layers of a CNN, these convolutions filter input data to extract information.

Source: Cadalyst.com

Position the kernel’s center element above the source pixel. Then, replace the source pixel with a weighted sum of itself and its neighboring pixels.

Parameter sharing and local connectivity are two principles used in CNNs. In a feature map, all neurons share weights, which defines parameter sharing. Local connection means each neuron connects only to a part of the input image, unlike a fully connected neural network where all neurons connect to every input. This reduces the number of parameters in the system and speeds up the calculation.

Padding and Stride

Padding and stride have an impact on how the convolution procedure is carried out. They can be used to increase or decrease the dimensions (height and width) of input/output vectors.

The term describes how many pixels a CNN kernel adds to an image during processing. If you set the padding in a CNN to zero, every added pixel value will be zero. If you set the zero padding to one, a one-pixel border with a zero value will surround the image.

Source: vitalflux.com

Padding works by increasing the processing region of a convolutional neural network. The kernel is a neural network filter that moves through a picture, scanning each pixel and turning the data into a smaller or bigger format. You add padding to the image frame to help the kernel process the image by providing more room for it to cover the image. padding to a CNN-processed image provides for more accurate image analysis.

Stride determines how the filter convolves over the input matrix, i.e. how many pixels shift. When you set the stride to 1, the filter moves across one pixel at a time, and when you set the stride to 2, the filter moves across two pixels at a time. The smaller the stride value, the smaller the output, and vice versa.

Pooling

Its purpose is to gradually shrink the representation’s spatial size to reduce the number of parameters and computations in the network. The pooling layer treats each feature map separately.

convolutional neural network| Basics of CNN — Source: Springer.com

The following are some methods for pooling:

Max-pooling: It chooses the most significant element from the feature map. The feature map’s significant features are stored in the resulting max-pooled layer. It is the most popular method since it produces the best outcomes.
Average pooling: It entails calculating the average for each region of the feature map.

Pooling gradually reduces the spatial dimension of the representation to reduce the number of parameters and computations in the network, as well as to prevent overfitting. If there is no pooling, the output has the same resolution as the input.

ReLU

The rectified linear activation function, or ReLU for short, is a piecewise linear function that, if the input is positive, outputs the input directly; else, it outputs zero. Because a model that utilizes it is quicker to train and generally produces higher performance, it has become the default activation function for many types of neural networks.

At the end of CNN, there is a Fully connected layer of neurons. As in CNN (conventional Neural Networks), neurons in a fully connected layer have full connections to all activations in the previous layer and work similarly. After training, the fully connected layer generates the feature vector that classifies images into distinct categories. Every activation unit in the next layer connects to all inputs from this layer. Overfitting occurs because all of the parameters are occupied in the fully-connected layer. It can reduce overfitting using various strategies, including dropout.

Soft-max is an activation layer that is typically applied to the network’s last layer, which serves as a classifier. This layer is responsible for categorizing provided input into distinct types. A network’s non-normalized output is mapped to a probability distribution using the softmax function.

Basic Python Implementation

Importing Some Relevant Libraries

import NumPy as np
%matplotlib inline
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import TensorFlow as tf
tf.compat.v1.set_random_seed(2019)

Loading the MNIST Dataset

(X_train,Y_train),(X_test,Y_test) = keras.datasets.mnist.load_data()

Scaling our Data

X_train = X_train / 255
X_test = X_test / 255


#flatenning
X_train_flattened = X_train.reshape(len(X_train), 28*28)
X_test_flattened = X_test.reshape(len(X_test), 28*28)

Designing Neural Network

model = keras.Sequential([

keras.layers.Dense(10, input_shape=(784,), activation='sigmoid')

])

model.compile(optimizer='adam',

loss='sparse_categorical_crossentropy',

metrics=['accuracy'])

model.fit(X_train_flattened, Y_train, epochs=5)

Output

Epoch 1/5
1875/1875 [==============================] - 8s 4ms/step - loss: 0.7187 - accuracy: 0.8141
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.3122 - accuracy: 0.9128
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2908 - accuracy: 0.9187
Epoch 4/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2783 - accuracy: 0.9229
Epoch 5/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2643 - accuracy: 0.9262

How Convolutional Layers works?

Sliding Filters: Imagine a small window sliding over an image. This window has some numbers in it called weights. As it moves, it multiplies these weights with the numbers in the image underneath, and adds them up to make a new number. Convolution layers extract features. Finding Patterns: By adjusting these weights, the window learns to recognize patterns like edges or textures. For example, it might learn to detect a horizontal line or a diagonal edge. Sharing Knowledge: Instead of having different windows all over the image, we use the same window everywhere. This saves a lot of memory and helps the network learn faster. Convolution neural networks utilize this technique.

Building a Picture: As we slide these windows over the image, we build up a new picture. Each new picture highlights different patterns that we’ve learned. This process is crucial for image recognition and computer vision tasks.

Making Things Smaller: Sometimes, we don’t need all the details. So, we shrink the picture by combining nearby numbers. This makes things faster and helps us focus on the most important parts. This is particularly useful in medical image analysis.

Adding Some Curves: After all these operations, we apply a simple rule to make our picture more expressive. This helps us capture complicated relationships between the patterns we’ve found. This step is common in convolutional neural networks and other deep learning models. By repeating these steps with different patterns and pictures, we can teach a computer to recognize all sorts of things in images, like cats, cars, or even emotions on people’s faces! This involves earlier layers learning basic features and later layers combining them to recognize entire images.

Conclusion

The goal of this article was to provide an overview of convolutional neural networks and their main applications. These networks, in general, produce excellent classification and recognition results. They’re also used to decode audio, text, and video. If the task at hand is to find a pattern in a series, CNN (Convolutional Neural Networks) are an excellent choice.

Read more articles about CNNs here.

Frequently Asked Questions

Q1. What are the basics of CNN?

A. Convolutional Neural Networks (CNNs) are a class of deep learning models designed for image processing. They employ convolutional layers to automatically learn hierarchical features from input images.

Q2. What is the basic principle of CNN?

A. The basic principle of CNN lies in feature learning through convolutional layers. These layers apply filters to input data, extracting meaningful features and capturing spatial hierarchies for accurate pattern recognition.

Q3. What are the 4 components of CNN?

A. The four key components of CNN are convolutional layers, pooling layers, fully connected layers, and activation functions. These elements work together to enable feature extraction, dimension reduction, and classification in image data.

Q4. What are the basi operations of CNN?

A. CNN operations include convolution, where filters detect features, pooling to downsample and retain essential information, flattening to convert data for fully connected layers, and activation functions for introducing non-linearity in the model’s learning process.

The media shown in this article does not belong to Analytics Vidhya and the Author uses it at their discretion.

Debasish Kalita

A graduate in Computer Science and Engineering from Tezpur Central University. Currently, I am pursuing my M.Tech in Computer Science and Engineering in the Department of CSE at NIT Durgapur. I expect to Postgraduate in the spring, 2022. A Grounded and Solution-oriented Computer Engineer with a wide variety of experiences. Adept at motivating self and others. Passionate about programming and educating the next generation of technology users and innovators.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Basics of CNN in Deep Learning

Table of contents

What is Convolutional Neural Network?

Convolutional Layer

Padding and Stride

Pooling

ReLU

Basic Python Implementation

Importing Some Relevant Libraries

Loading the MNIST Dataset

Scaling our Data

Designing Neural Network

Output

How Convolutional Layers works?

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or