How to Approach CNN Architecture from Scratch?

Premanand S Last Updated : 01 Jul, 2022

8 min read

This article was published as a part of the Data Science Blogathon.

Introduction on CNN Architecture

Hello, and welcome again to another intriguing subject. As a consequence of the large quantity of data accessible, particularly in the form of photographs and videos, the need for Deep Learning is growing by the day. Many advanced designs have been observed for diverse objectives, but Convolution Neural Network – Deep Learning techniques are the foundation for everything. So that’ll be the topic of today’s piece.

Deep Learning

Deep learning is a machine learning and artificial intelligence (AI) area that mimics how people learn. Data science, which covers statistics and predictive modeling, contains deep learning as a significant component. For data scientists who must obtain, analyze, and interpret massive volumes of data, deep learning is particularly useful since it speeds up and simplifies the process.

Layman’s Explanation of Deep Learning

I hope you all enjoyed Inception (2010), a film with an idea and technology application that is both baffling and exciting. The film’s major theme is that a dream may be used to implant a thought in a person’s subconscious mind, which subsequently influences their behavior. This may be accomplished through the use of a creative notion known as shared dreaming. This is what “deep learning” implies simply put (Inception on a Machine rather than a person)

Using the above Inception timeline, you may need to go down several layers deep into the machine neural structure (technically called forward propagation, but in the diagram, it’s called the different level of dreams) and perform a kick (backpropagation) to reinforce the learning, depending on what you want to achieve. For inception, the neural nodes (shared-state dreamers) utilize an activation function/architect (Relu, Sigmoid, and others).

Layman's explanation of Deep Learning| CNN Architecture

The nodes/people in the current layer (vanishing gradient) may be assassinated and go into limbo on rare occasions, jeopardizing the entire inception (learning) process. Fortunately, a skilled pharmacist (bias function / leaky Relu) may be able to provide the therapy needed to avoid it. It’s probable that you’ll have to perform this procedure (machine learning epoch) numerous times before you achieve convergence, which means the system is behaving as intended.

You’ll need a featured architect just like you’ll need an architect to design your beautiful landscape. As the genesis process goes, you seed the machine with a feature, and it continues to build more and more intricate features. From the input, each buried layer creates a feature. As you go through the phases, these features become more sophisticated and responsive to learning. As a result, you essentially produce conception by probing deep into the machine’s psyche. The system then “learns” to recognize faces, handwriting, optical characters, and other unimaginable objects.

Why do we go from Machine Learning to Deep Learning?

The bulk of the necessary features in Machine Learning approaches must be determined by a domain expert to reduce data complexity and make patterns more obvious for learning algorithms to work. The main advantage of Deep Learning algorithms is that they seek to gain high-level qualities from data progressively. As a result, the requirement for domain expertise and the extraction of hard-core features is decreased.

Deep Learning approaches need to break down problem statements into different pieces and then combine their results at the conclusion, whereas Machine Learning strategies require breaking down problem statements into distinct parts and then combining their results at the end. Deep learning systems like Yolo net take a photo as input and output the position and names of objects in a multiple object detection task. However, before the HOG can be used as an input to a learning algorithm to categorize relevant items in Machine Learning methods like SVM, it must first be used to identify all conceivable objects using a bounding box object identification approach.

Why do we go from Machine Learning to Deep Learning?| CNN Architecture

Types of Deep Learning

There are many types based on application and the architecture,

Artificial Neural Network
Convolution Neural Network
Recurrent Neural Network
Autoencoders
Self Organizing Map
Multi-layer Perceptron and many more…

Convolution Neural Network:

In simple, A convolutional neural network is a deep learning network design that learns from the input without the requirement for human feature extraction. Please refer to this blog for a detailed explanation

Layers in Convolution Neural Network

CNN has certain building components for constructing the architecture, such as

Convolution layer

The convolutional layer, which holds the majority of the computation, is the foundation of a CNN. It requires input data, a filter, and a feature map, among other things. Assume the input is a 3D pixel matrix with a color picture. This implies that the input will have three dimensions that correspond to an image’s RGB color space. A feature detector, also known as a kernel or a filter, will look for the feature in the image’s receptive fields. Convolution is the name for this technique.

When the filters don’t fit the input image, zero padding is utilized. All members are set to zero outside of the input matrix, resulting in a bigger or comparable output. Padding comes in three varieties:

No Valid padding is also known as valid padding. If the dimensions do not align, the final convolution is discarded.

Same Padding: This padding guarantees that the output layer matches the input layer in size.

Full Padding: By padding, the input with zeros, this form of padding increases the size of the output.

Activation layer

To determine whether a neuron should be activated or not, the activation function produces a weighted sum and then adds bias to it. The activation function’s purpose is to make a neuron’s output non-linear.

Pooling layer

The number of parameters in input is reduced using a dimensionality reduction technique called a pooling layer or downsampling. The pooling approach, like the convolutional layer, sweeps a filter across the whole input, but this filter does not include any weights. Instead, the kernel uses an aggregation function to fill the output array with values from the receptive field.

Max pooling: The filter picks the pixel with the highest value to transmit to the output array as it advances across the input. In comparison to average pooling, this strategy is employed more frequently.

Average pooling: The filter calculates the average value inside the receptive field as it passes across the input and sends it to the output array.

Flattening and Fully Connected Network

A convolutional neural network’s last level is a classifier (CNN). It’s called a dense layer, and it’s just an artificial neural network (ANN) classifier.

An ANN classifier, like any other classifier, requires certain properties. This suggests that a feature vector is required.

As a result, you must convert the output of the convolutional component of the CNN into a 1D feature vector that the ANN can use. This procedure is known as flattening. It flattens all of the structure of the convolutional layers’ output into a single long feature vector that the dense layer may use for classification.

Advantages

Better to train the model in terms of computing
Very High accuracy in image recognition problems
Automatically detects the important features without any human supervision.

Disadvantages

Adversarial attacks
Data-intensive training

Applications of CNN

Image classification
Object detection
Audiovisual matching
Object reconstruction
Speech recognition

CNN Using Image

Original Dataset

Kaggle Dataset

!wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip

#for unzipping the dataset
!unzip kagglecatsanddogs_3367a.zip

after we download the necessary dataset that we need for processing, we need to import some basic libraries,

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.preprocessing.image import load_img
import warnings
import seaborn as sns
import os
import random
warnings.filterwarnings('ignore')

all the images in the file can be converted to data frame format for further process to make it smooth,

#Cat - 0 and Dog -1

input_path = []
label = []

for class_name in os.listdir("PetImages"):
    for path in os.listdir("PetImages/"+class_name):
        if class_name == 'Cat':
            label.append(0)
        else:
            label.append(1)
        input_path.append(os.path.join("PetImages", class_name, path))
print(input_path[20000], label[20000])

to cross-check and see the result,

print(input_path[2], label[2])

some basic statistical and view the dataset,

len(input_path)
dataset = pd.DataFrame()
dataset['images'] = input_path
dataset['label'] = label
dataset = dataset.sample(frac=1).reset_index(drop=True)
dataset.head()
dataset.tail()
dataset.shape
dataset.info()

in order to delete the junk or useless files other than the .jpg format

#for corrupted images
import PIL
l = []
for image in dataset['images']:
    try:
        img = PIL.Image.open(image)
    except:
        l.append(image)
l

deleting the above mentioned unwanted files

# delete db files
dataset = dataset[dataset['images']!='PetImages/Dog/Thumbs.db']
dataset = dataset[dataset['images']!='PetImages/Cat/Thumbs.db']
dataset = dataset[dataset['images']!='PetImages/Cat/666.jpg']
dataset = dataset[dataset['images']!='PetImages/Dog/11702.jpg']
len(dataset)

EDA – Dog

# to display grid of images
plt.figure(figsize=(25,25))
temp = dataset[dataset['label']==1]['images']
start = random.randint(0, len(temp))
files = temp[start:start+25]

for index, file in enumerate(files):
    plt.subplot(5,5, index+1)
    img = load_img(file)
    img = np.array(img)
    plt.imshow(img)
    plt.title('Sample Dogs images')
    plt.axis('off')

EDA – Cat

# to display grid of images
plt.figure(figsize=(25,25))
temp = dataset[dataset['label']==0]['images']
start = random.randint(0, len(temp))
files = temp[start:start+25]

for index, file in enumerate(files):
    plt.subplot(5,5, index+1)
    img = load_img(file)
    img = np.array(img)
    plt.imshow(img)
    plt.title('Sample Cats images')
    plt.axis('off')

as both the category which has an equal amount of data,

sns.countplot(dataset['label'])

Datagenertor for Images

dataset['label'] = dataset['label'].astype('str')

# input split
from sklearn.model_selection import train_test_split
train, test = train_test_split(dataset, test_size=0.3, random_state=42)
from keras.preprocessing.image import ImageDataGenerator
train_generator = ImageDataGenerator(
    rescale = 1./255,  # normalization of images
    rotation_range = 40, # augmention of images to avoid overfitting
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
)

val_generator = ImageDataGenerator(rescale = 1./255)

train_iterator = train_generator.flow_from_dataframe(
    train, 
    x_col='images', 
    y_col='label', 
    target_size=(128,128), 
    batch_size=512, 
    class_mode='binary'
)
val_iterator = val_generator.flow_from_dataframe(
test,
x_col='images',
y_col='label',
target_size=(128,128),
batch_size=512,
class_mode='binary'
)

Modeling

from keras import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense

model = Sequential([
                    Conv2D(16, (3,3), activation='relu', input_shape=(128,128,3)),
                    MaxPool2D((2,2)),
                    Conv2D(32, (3,3), activation='relu'),
                    MaxPool2D((2,2)),
                    Conv2D(64, (3,3), activation='relu'),
                    MaxPool2D((2,2)),
                    Conv2D(128, (3,3), activation='relu'),
                    MaxPool2D((2,2)),
                    Flatten(),
                    Dense(256, activation='relu'),
                    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

history = model.fit(train_iterator, epochs=15, validation_data=val_iterator)

In order to see the output in visualization

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(len(acc))
plt.plot(epochs, acc, 'b', label='Training Accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation Accuracy')
plt.title('Accuracy Graph')
plt.legend()
plt.figure()
loss = history.history['loss']
val_loss = history.history['val_loss']
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.plot(epochs, val_loss, 'r', label='Validation Loss')
plt.title('Loss Graph')
plt.legend()
plt.show()

Training & Validation Loss| CNN Architecture

Conclusion

We’ve come to the end of the topic, so the main takeaways are, that we have seen how we are getting datasets from websites, then what are the basic preprocessing or data argumentation is needed for any basic image processing, and then we are constructing CNN architectures using layers like convolution, maxpool, flatten and dense layers. The reason for this blog is, that we have to understand the basic architectures, which leads to advanced architectures like GAN, YOLO, etc.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Premanand S

Premanand S is a dedicated academic with over a decade of research experience specializing in Bio-signal Processing, Machine Learning, and Deep Learning. He earned his B.Tech in 2009 from Amrita Vishwa Vidyapeetham, Bangalore, and completed his M.E. in 2011 from Rajalakshmi Engineering College, Chennai, where his thesis focused on Deep Learning for ECG Signal Processing.

Currently pursuing his Ph.D. at VIT-Chennai, his research, titled "Deep Learning Approaches for Enhanced ECG Signal Processing and Arrhythmia Classification," aims to leverage cutting-edge deep learning techniques to improve the accuracy and efficiency of ECG signal analysis, contributing significantly to advancements in cardiac health monitoring.

A recipient of the prestigious TCS-RSP (Research Scholarship) in 2014, Cycle 9, Premanand has established himself as a recognized figure in the academic community. He has been invited to deliver talks on Data Science, Machine Learning, and Deep Learning at prominent institutions across India, sharing his expertise and insights with researchers and students alike.

As an Assistant Professor at VIT-Chennai, he continues to mentor and inspire the next generation of researchers while pushing the boundaries of knowledge in his field.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

How to Approach CNN Architecture from Scratch?

Introduction on CNN Architecture

Deep Learning

Layman’s Explanation of Deep Learning

Why do we go from Machine Learning to Deep Learning?

Types of Deep Learning

Convolution Neural Network:

Layers in Convolution Neural Network

Convolution layer

Activation layer

Pooling layer

Flattening and Fully Connected Network

Advantages

Disadvantages

Applications of CNN

CNN Using Image

EDA – Dog

EDA – Cat

Datagenertor for Images

Modeling

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie