“How did your neural network produce this result?” This question has sent many data scientists into a tizzy. It’s easy to explain how a simple neural network works, but what happens when you increase the layers 1000x in a computer vision project?
Our clients or end users require interpretability – they want to know how our model got to the final result. We can’t take a pen and paper to explain how a deep neural network works. So how do we shed this “black box” image of neural networks?
By visualizing them! The clarity that comes with visualizing the different features of a neural network is unparalleled. This is especially true when we’re dealing with a convolutional neural network (CNN) trained on thousands and millions of images.
In this article, we will look at different techniques for visualizing convolutional neural networks. Additionally, we will also work on extracting insights from these visualizations for tuning our CNN model.
Note: This article assumes you have a basic understanding of Neural Networks and Convolutional Neural Networks. Below are three helpful articles to brush up or get started with this topic:
Why Should we use Visualization to Decode Neural Networks?
Setting up the Model Architecture
Accessing Individual Layers of a CNN
Filters – Visualizing the Building Blocks of CNNs
Activation Maximization – Visualizing what a Model Expects
Occlusion Maps – Visualizing what’s important in the Input
Saliency Maps – Visualizing the Contribution of Input Features
Class Activation Maps
Layerwise Output Visualization – Visualizing the Process
Why Should we use Visualization to Decode Neural Networks?
It’s a fair question. There are a number of ways to understand how a neural network works, so why turn to the off-beaten path of visualization?
Let’s answer this question through an example. Consider a project where we need to classify images of animals, like snow leopards and Arabian leopards. Intuitively, we can differentiate between these animals using the image background, right?
Both animals live in starkly contrasting habitats. The majority of the snow leopard images will have snow in the background while most of the Arabian leopard images will have a sprawling desert.
Here’s the problem – the model will start classifying snow versus desert images. So, how do we make sure our model has correctly learned the distinguishing features between these two leopard types? The answer lies in the form of visualization.
Visualization helps us see what features are guiding the model’s decision for classifying an image.
There are multiple ways to visualize a model, and we will try to implement some of them in this article.
Setting up the Model Architecture
I believe the best way of learning is by coding the concept. Hence, this is a very hands-on guide and I’m going to dive into the Python code straight away.
We will be using the VGG16 architecture with pretrained weights on the ImageNet dataset in this article. Let’s first import the model into our program and understand its architecture.
We will visualize the model architecture using the ‘model.summary()’ function in Keras. This is a very important step before we get to the model building part. We need to make sure the input and output shapes match our problem statement, hence we visualize the model summary.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Below is the model summary generated by the above code:
We have a detailed architecture of the model along with the number of trainable parameters at every layer. I want you to spend a few moments going through the above output to understand what we have at hand.
This is important when we are training only a subset of the model layers (feature extraction). We can generate the model summary and ensure that the number of non-trainable parameters matches the layers that we do not want to train.
Also, we can use the total number of trainable parameters to check whether our GPU will be able to allocate sufficient memory for training the model. That’s a familiar challenge for most of us working on our personal machines!
Accessing Individual Layers
Now that we know how to get the overall architecture of a model, let’s dive deeper and try to explore individual layers.
It’s actually fairly easy to access the individual layers of a Keras model and extract the parameters associated with each layer. This includes the layer weights and other information like the number of filters.
Now, we will create dictionaries that map the layer name to its corresponding characteristics and layer weights:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Did you notice that the trainable parameter for our layer ‘block5_conv1‘ is true? This means that we can update the layer weights by training the model further.
Visualizing the Building Blocks of CNNs – Filters
Filters are the basic building blocks of any Convolutional Neural Network. Different filters extract different kinds of features from an image. The below GIF illustrates this point really well:
As you can see, every convolutional layer is composed of multiple filters. Check out the output we generated in the previous section – the ‘block5_conv1‘ layer consists of 512 filters. Makes sense, right?
Let’s plot the first filter of the first convolutional layer of every VGG16 block:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can see the filters of different layers in the above output. All the filters are of the same shape since VGG16 uses only 3×3 filters.
Visualizing what a Model Expects – Activation Maximization
Let’s use the image below to understand the concept of activation maximization:
Which features do you feel will be important for the model to identify the elephant? Some major ones I can think of:
Tusks
Trunk
Ears
That’s how we instinctively identify elephants, right? Now, let’s see what we get when we try to optimize a random image to be classified as that of an elephant.
We know that every convolutional layer in a CNN looks for similar patterns in the output of the previous layer. The activation of a convolutional layer is maximized when the input consists of the pattern that it is looking for.
In the activation maximization technique, we update the input to each layer so that the activation maximization loss is minimized.
How do we do this? We calculate the gradient of the activation loss with respect to the input, and then update the input accordingly:
Here’s the code for doing this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Our model generated the below output using a random input for the class corresponding to Indian Elephant:
From the above image, we can observe that the model expects structures like a tusk, large eyes, and trunk. Now, this information is very important for us to check the sanity of our dataset. For example, let’s say that the model was focussing on features like trees or long grass in the background because Indian elephants are generally found in such habitats.
Then, using activation maximization, we can figure out that our dataset is probably not sufficient for the task and we need to add images of elephants in different habitats to our training set.
Visualizing what’s Important in the Input- Occlusion Maps
Activation maximization is used to visualize what the model expects in an image. Occlusion maps, on the other hand, help us find out which part of the image is important for the model.
Now, to understand how occlusion maps work, we consider a model that classifies cars according to their manufacturers, like Toyota, Audi etc.:
Can you figure out which company manufactured the above car? Probably not because the part where the company logo is placed has been occluded in the image. That part of the image is clearly important for our classification purposes.
Similarly, for generating an occlusion map, we occlude some part of the image and then calculate its probability of belonging to a class. If the probability decreases, then it means that occluded part of the image is important for the class. Otherwise, it is not important.
Here, we assign the probability as pixel values for every part of the image and then standardize them to generate a heatmap:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The above code defines a function iter_occlusion that returns an image with different masked portions.
Now, let’s import the image and plot it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Calculate the probabilities for different masked portions
Plot the heatmap
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Really interesting. We will now create a mask using the standardized heatmap probabilities and plot it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Finally, we will impose the mask on our input image and plot that as well:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Can you guess why we’re seeing only certain parts? That’s right – only those parts of the input image that had a significant contribution to its output class probability are visible. That, in a nutshell, is what occlusion maps are all about.
Visualizing the Contribution of Input Features- Saliency Maps
Saliency maps calculate the effect of every pixel on the output of the model. This involves calculating the gradient of the output with respect to every pixel of the input image.
This tells us how to output category changes with respect to small changes in the input image pixels. All the positive values of gradients mean that small changes to the pixel value will increase the output value:
These gradients, which are of the same shape as the image (gradient is calculated with respect to every pixel), provide us with the intuition of attention.
Let’s see how to generate saliency maps for any image. First, we will read the input image using the below code segment.
Input Image
Now, we will generate the saliency map for the image using the VGG16 model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We see that the model focuses more on the facial part of the dog. Now, let’s look at the results with guided backpropagation:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Guided backpropogation truncates all the negative gradients to 0, which means that only the pixels which have a positive influence on the class probability are updated.
Class Activation Maps (Gradient Weighted)
Class activation maps are also a neural network visualization technique based on the idea of weighing the activation maps according to their gradients or their contribution to the output.
The following excerpt from the Grad-CAM paper gives the gist of the technique:
Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for ‘dog’ or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.
In essence, we take the feature map of the final convolutional layer and weigh (multiply) every filter with the gradient of the output with respect to the feature map. Grad-CAM involves the following steps:
Take the output feature map of the final convolutional layer. The shape of this feature map is 14x14x512 for VGG16
Calculate the gradient of the output with respect to the feature maps
Apply Global Average Pooling to the gradients
Multiply the feature map with corresponding pooled gradients
We can see the input image and its corresponding Class Activation Map below:
Now let’s generate the Class activation map for the above image.
Visualizing the Process – Layerwise Output Visualization
The starting layers of a CNN generally look for low-level features like edges. The features change as we go deeper into the model.
Visualizing the output at different layers of the model helps us see what features of the image are highlighted at the respective layer. This step is particularly important to fine-tune an architecture for our problems. Why? Because we can see which layers give what kind of features and then decide which layers we want to use in our model.
For example, visualizing layer outputs can help us compare the performance of different layers in the neural style transfer problem.
Let’s see how we can get the output at different layers of a VGG16 model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The above image shows the different features that are extracted from the image by every layer of VGG16 (except block 5). We can see that the starting layers correspond to low-level features like edges, whereas the later layers look at features like the roof, exhaust, etc. of the car.
End Notes
Visualization never ceases to amaze me. There are multiple ways to understand how a technique works, but visualizing it makes it a whole lot more fun. Here are a couple of resources you should check out:
The process of feature extraction in neural networks is an active research area and has led to the development of awesome tools like Tensorspace and Activation Atlases
TensorSpace is also a neural network visualization tool that supports multiple model formats. It lets you load your model and visualize it interactively. TensorSpace also has a playground where multiple architectures are available for visualization which you can play around with
Let me know if you have any questions or feedback on this article. I’ll be happy to get into a discussion!
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Powered By
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.
This is lovely. Thank you for sharing.
A great article, thanks. Any ideas to visualize 3D convolutional neural networks?
Hi Xu, thanks. You can check out the following Github repo. https://github.com/OlesiaMidiana/3dcnn-vis
Thank you for the exposure. What a wonderful piece of work!
Hi, Dibia. Thanks.