MNIST Image Reconstruction Using an Autoencoder

ANURAG SINGH CHOUDHARY Last Updated : 20 Jul, 2023

8 min read

Introduction

With so much information on the Internet, researchers and scientists are trying to develop more efficient and secure data transfer methods. Autoencoders have emerged as valuable tools for this purpose due to their simple and intuitive architecture. Usually, after the autoencoder is trained, the encoder weights can be sent to the sender, and the decoder weights to the receiver. This allows the sender to send data in an encoded format, saving time and cost, while the receiver can receive compressed data. This article explores the exciting application of autoencoders in MNIST image reconstruction, especially using the MNIST numerical database and the PyTorch framework in Python.

Learning Objectives

This article focuses on building a TensorFlow Autoencoder capable of encoding MNIST images.
We will implement functions to load and process databases and create dynamic transformations of data points.
Encoder-Decoder Architecture Autoencoder will be generated using noisy and real images as input.
Explore the importance of autoencoders in deep learning, their application principles, and their potential to improve model performance.

This article was published as a part of the Data Science Blogathon.

Introduction
The Architecture of Autoencoders
The Relationship Among the Encoder, Bottleneck, and Decoder
How to Train Autoencoders?
Requirements
Building the AutoEncoder
Conclusion
Frequently Asked Questions

The Architecture of Autoencoders

Autoencoders can be divided into three main components:

Encoder: this module takes the input data from the train-validation-test set and compresses it into an encoded representation. Typically, the coded image data is smaller than the input data.

Bottleneck: the bottleneck module keeps the knowledge representation compressed and makes it a critical part of the network. The data dimension becomes a decreasing barrier.

Decoder: The decoder module is crucial in restoring the data representation to its original form by “decompressing” it. The resulting output from the decoder is then compared to either the ground truth or the initial input data.

The decoder module assists in “decompressing” the data display and reconstructing it in its encoded form. The output of the decoder is then equated with the ground truth or the original input data.

The Relationship Among the Encoder, Bottleneck, and Decoder

Encoder

The encoder plays a significant character in compressing input data through the pooling module and convolutional block. This compression produces a compact image called a block.

After a delay, the decoder plays. It consists of high-level modules that return features compressed to the original image format. In the basic autoencoders, the decoder aims to reconstruct the output similar to the input regardless of noise reduction.MNIST Image Reconstruction Using an Autoencoder

However, in the case of variable autoencoders, the input is not a reconstruction of the input. Instead, it creates an entirely new image based on the input data given to the model. This difference allows variable autoencoders to have some control over the resulting image and produce different results.

Bottleneck

Although the bottleneck is the smallest part of the nervous system, it is very important. It acts as a critical element that limits data flow from the encoder to the decoder, allowing only the most critical data to pass through. By limiting the flow, the barrier ensures that crucial properties are preserved and used in recovery.

This represents the type of input knowledge by designing obstacles to extract maximum information from the image. The encoder-decoder structure enables the extraction of valuable information from images and the creation of meaningful connections between various inputs in the network.

This compressed form of processing prevents the nervous system from memorizing input and information overload. As a general guideline, the smaller the barrier, the lower the excess risk.

However, very small buffers can limit the amount of data stored, increasing the likelihood that essential data will be lost through the encoder’s pool layer.

Decoder

A decoder consists of an uplink and convolution block reconstructing output interrupts.

Once the input reaches the decoder that receives the compressed representation, it becomes a “decompressor”. The role of the decoder is to reconstruct the image based on the hidden properties extracted from the compressed image. By using this hidden property, the decoder effectively reconstructs the image by reversing the compression process done by the encoder.

How to Train Autoencoders?

Before setting up the autoencoder, there are four important hyperparameters:

Code size: Code size, also known as block size, is an essential hyperparameter in autoencoder tuning. Specifies the data compression level. Additionally, the size of the code can act as a regularization term.
Several layers: Like other neural networks, encoder, and decoder depth is a vital autoencoder hyperparameter. Increasing the depth adds complexity to the model while decreasing the depth increases processing speed.
Number of points in each layer: The number of points in each layer determines the weight used in each layer. Typically, the number of points decreases as we go through the next layer in the autoencoder, indicating that the input is decreasing.
Loss Recovery: The choice of the loss function to train the autoencoder depends on the desired input-output adaptation. When working with image data, popular loss functions for reconstruction include mean square error (MSE) loss and L1 loss. Binary Cross Entropy can also be used as a reconstruction loss if the inputs and outputs are in the range [0,1], for example, with MNIST.

Requirements

We need this library and helper functions to create an Autoencoder in Tensorflow.

Tensorflow: To begin, we should import the Tensorflow library and all the necessary components for creating our model, enabling it to read and generate MNIST images.

NumPy: Next, we import numpy, a powerful library for processing numbers, which we will use for preprocessing and reorganizing the database.

Matplotlib: We will use the matplotlib planning library to visualize and evaluate the model’s performance.

The data_proc(dat) function takes the helper function as data and resizes it to the size required by the model.
The gen_noise(dat) helper function is designed to accept an array as input, apply Gaussian noise, and guarantee that the resulting values fall within the range of (0,1).
Two Arrays is a display helper function (dat1, dat2) that takes an input array and an array of predicted images and puts them into two rows.

Building the AutoEncoder

In the next part, we will learn how to create a simple Autoencoder using TensorFlow and train it using MNIST images. First, we will outline the steps to load and process MNIST data to meet our requirements. Once the data is properly formatted, we build and train the model.

The network architecture consists of three main components: Encoder, Bottleneck, and Decoder. The encoder is responsible for compressing the input image while preserving valuable information. bottleneck determines which features are essential to go through the decoder. Finally, the Decoder uses the Bottleneck result to reconstruct the image. During this reconstruction process, the Autoencoder aims to learn the hidden location of the data.

We must import some libraries and write some functions to create a model to read and create MNIST images. Use the TensorFlow library to import it with other related components. Also, import NumPy numerical processing library and Matplotlib plotting library. This library will help us perform some operations and visualize the results.

Import Library

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras.layers import *
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model

In addition, we need the implementation of some auxiliary functions. The initialization function is responsible for receiving an array as input and changing the size to the required size for the model.

def data_proc(dat):
    larr = len(dat)
    return np.reshape(dat.astype("float32") /255.0 , (larr, 28,28,1))

We must also add a second helper function that operates on the Array. This function adds Gaussian noise to the array and ensures that the resulting value is between 0 and 1.

def gen_noise(dat):
    return np.clip(dat + 0.4 * np.random.normal(loc=0.0, scale=1.0, size=dat.shape), 0.0, 1.0)

Evaluate the Performance of Model

To evaluate the performance of our model, it is important to visualize a large number of images. For this purpose, we can use an input function that takes two arrays, a set of projected images, and a third function that puts them into two rows.

def display(dat1, dat2):
    ind = np.random.randint(len(dat1), size=10)
    im1 = dat1[ind, :]
    im2 = dat2[ind, :]
    for i, (a, b) in enumerate(zip(im1, im2)):
        plt_axis = plt.subplot(2, n, i + 1)
        plt.imshow(a.reshape(28, 28))
        plt.gray()
        plt_axis.get_xaxis().set_visible(False)
        plt_axis.get_yaxis().set_visible(False)
        
        plt_axis = plt.subplot(2, n, i + 1 + n)
        plt.imshow(b.reshape(28, 28))
        plt.gray()
        plt_axis.get_xaxis().set_visible(False)
        plt_axis.get_yaxis().set_visible(False)
    plt.show()

Dataset Preparation

The MNIST dataset has been provided in TensorFlow, divided into training and test datasets. We can load this database directly and use the default processing functions defined earlier. Additionally, we generate a noisy version of the original MNIST image for the second half of the input data using the gen_noise function we defined earlier. It should be noted that the input noise level affects image distortion, making it difficult to perform well in model reconstruction. We will imagine the original image and noise as part of the process.

(ds_train, _), (ds_test, _) = mnist.load_data()
ds_train,ds_test = data_proc(ds_train), data_proc(ds_test)
noisy_ds_train, noisy_ds_test = gen_noise(ds_train), gen_noise(ds_test)
display(ds_train, noisy_ds_train)

Encoder Definition

The encoder part of the network uses Convolutional and Max Pooling layers with ReLU activation. The goal is to cool the input data before sending it over the network. The desired output from this step is a compressed version of the original data. Given that the MNIST image has a 28x28x1 image, we create an input with a certain shape.

inps = Input(shape=(28, 28, 1))


x = Conv2D(32, (3, 3), activation="relu", padding="same")(inps)
x = MaxPooling2D((2, 2), padding="same")(x)
x = Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = MaxPooling2D((2, 2), padding="same")(x)

Bottleneck Definition

In contrast to other elements, the Bottleneck does not necessitate explicit programming. As the MaxPooling Encoder layer yields a highly condensed final output, the Decoder is trained to reconstruct the image utilizing this compressed representation. The architecture of the Bottleneck can be modified in a more intricate Autoencoder implementation.

Decoder Definition

The Decoder consists of Transposed Convolutions with a stride of 2. The last layer of the model utilizes a simple 2D convolution with the sigmoid activation function. The purpose of this component is to reconstruct images from the compressed representation. The Transposed Convolution is employed for upsampling, allowing for larger strides and reducing the number of steps required to upsample the images.

x = Conv2DTranspose(32, (3, 3),activation="relu", padding="same", strides=2)(x)
x = Conv2DTranspose(32, (3, 3),activation="relu", padding="same", strides=2)(x)
x = Conv2D(1, (3, 3), activation="sigmoid", padding="same")(x)

Model Training

After defining the model, it must be configured with the optimizer and loss functions. In this article, we will use the Adam Optimizer and select the Binary Cross Entropy Loss function for training.


conv_autoenc_model = Model(inps, x)
conv_autoenc_model.compile(optimizer="adam", loss="binary_crossentropy")
conv_autoenc_model.summary()

Output

Once the model is built, we can train it using the modified MNIST images created earlier in the article. The training process involves running the model for 50 epochs with a batch size of 128. In addition, we provide validation data for the model.

conv_autoenc_model.fit(
    x=ds_train,
    y=ds_train,
    epochs=50,
    batch_size=128,
    shuffle=True,
    validation_data=(ds_test, ds_test),
)

Reconstructing Images

Once we train the model, we can generate predictions and reconstruct images. We can use the previously defined function to display the resulting image.

preds = conv_autoenc_model.predict(ds_test)
display(ds_test, preds)

Conclusion

An autoencoder is an artificial neural network that you can use to learn unsupervised data encoding. The main goal is to obtain a low-dimensional representation, often called encoding, for high-dimensional data to reduce the dimension. Grids enable efficient data representation and analysis to capture the input image’s most important features or characteristics.

Key Takeaways

Autoencoders are unsupervised learning techniques used in neural networks. Design it to learn efficient data representation (encoding) by training the network to filter unwanted signal noise.
Autoencoders have a variety of applications, including imaging, image compression, and in some cases, even image generation.
Although autoencoders seem straightforward at first glance due to their simple theoretical basis, teaching them to learn meaningful representations of input data can be challenging.
Autoencoders have several applications, such as principal component analysis (PCA), a dimensionality reduction technique, image rendering, and many other tasks.

Frequently Asked Questions

Q1. What are Autoencoders?

Answer: Autoencoder is a technique that encodes data automatically. It develops neural networks to learn how to divide data, especially images, into compact images. Using this encoded representation, the autoencoder tries to reconstruct the original data as faithfully as possible.

Q2. When should we not use autoencoders?

Answer: Autocoders may introduce input errors or limitations in key relationship variables that differ from those in the training set, which may result in inaccurate data. Additionally, there is a risk of removing important information from the input data during the compression and reconstruction process.

Q3. Is autoencoder better than PCA?

Answer: When we compare the performance of autoencoders and PCA (Principal Component Analysis) for dimension reduction, we perform a performance evaluation using the extensive MNIST database. In this scenario, the autoencoder model performs better than the PCA model. This result can be attributed to the size and non-linear nature of the MNIST database, which is better suited to the capabilities of the auto-encoder.

Q4. Explain the limitations of autoencoders.

Answer: Autoencoders are very sensitive to input errors and can outperform manual approaches. Furthermore, there is probably no significant advantage to using an autoencoder under time constraints regarding output and speed. The complexity associated with implementing an autoencoder adds a layer of complexity and control that may not be necessary in some situations.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

ANURAG SINGH CHOUDHARY

Passionate Machine learning professional and data-driven analyst with the ability to apply ML techniques and various algorithms to solve real-world business problems. I have always been fascinated by Mathematics and Numbers. Over the past few months, I have dedicated a considerable amount of time and effort to Machine Learning Studies.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

MNIST Image Reconstruction Using an Autoencoder

Introduction

Learning Objectives

Table of contents

The Architecture of Autoencoders

The Relationship Among the Encoder, Bottleneck, and Decoder

Encoder

Bottleneck

Decoder

How to Train Autoencoders?

Requirements

Building the AutoEncoder

Import Library

Evaluate the Performance of Model

Dataset Preparation

Encoder Definition

Bottleneck Definition

Decoder Definition

Model Training

Output

Reconstructing Images

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID