Nvidia’s DLSS: Deep Learning Super Sampling

badrinarayan6645541 25 Jun, 2024

10 min read

Introduction

Nvidia is a US-based corporation, headquartered in Santa Clara, California. Recently, it became the largest company in the world, by market cap. This is mainly because of the AI boom and demand for AI chips. Their high-performance GPUs are well-known. These GPUs have had a significant influence on gaming and artificial intelligence. Deep Learning Super Sampling (DLSS) is one of Nvidia’s greatest inventions. If you are a gamer, you would undoubtedly want to know what DLSS was and how it affected your games. That’s the topic this article will address. We will understand how DLSS operates and how AI is used by DLSS to enhance performance.

Also Read: NVIDIA Launches Ampere-based RTX A1000 and A400 Pro GPUs

Nvidia's DLSS: Deep Learning Super Sampling

Overview

Understand what Nvidia’s DLSS is and how it works.
Follow the history and understand the evolution of DLSS.
Learn the functionalities of the current model – DLSS 3.5.
Find out how DLSS compares to its two main competitors – AMD’s FSR and Intel’s XeSS.

What is DLSS?

An exclusive family of real-time deep learning image enhancement and upscaling technologies to Nvidia’s RTX graphics card line, DLSS, is integrated into 500+ games and apps as of 18th June 2024. These technologies aim to improve performance by enabling most of the graphics pipeline to operate at a lower resolution. From there, a higher resolution image that roughly corresponds to the same level of detail as if the image had been rendered at this higher resolution is inferred. Depending on user preference, this enables higher graphical settings and/or frame rates for a given output resolution.

How Does DLSS Work?

We are going to see how DLSS would basically work. The foundation of DLSS, a convolutional auto-encoder neural network, must be explained before we can discuss how DLSS is trained.

Convolutional Auto-encoder Neural Networks

There are two ways to examine the foundation of convolutional auto-encoder neural networks. The two decoders and encoders. Convolutional layers are used by the encoder to condense the input data, extract important features, and create a compressed representation of the input. In essence, the model scans the image to identify the most significant elements, such as edges or shapes, and then condenses all of the pertinent data into a single “summary” of the image.

The compressed representation, or “summary,” is then used by the decoder to reconstruct the original input data using transposed convolutional layers. The decoder is used to generate an output that closely resembles the input data while retaining the important features.

The Foundation of DLSS

This kind of model is used for DLSS to convert low-resolution images into high-resolution images. Thousands of distinct sample scenes with varying lighting and post-production effects are used by NVIDIA during training to create frame sequences that are rendered at 1080p and 16k resolutions (using a supercomputer). The 16k render serves as the ground truth, while the 1080p rendered frame serves as the input. The input also contains exposure data, depth buffers, and motion vectors.

After the model has processed the inputs, a 4K resolution image is produced. By comparing the 4K and 16K images, NVIDIA is able to determine how much the output deviates from the ground truth by applying a loss function. Then, by feeding that data back via backpropagation, the neural network’s parameters are slightly changed. The next frame is then processed in the same manner. A neural network that is exceptional at taking low-resolution frames and producing high-resolution frames is created when this is done millions of times.

The Post-training Process

After training is finished, driver updates are used to distribute the model to the graphics cards that are currently installed in PCs. The model makes use of NVIDIA’s GPU tensor cores, which are made expressly to speed up matrix-based operations that are frequently involved in workloads related to deep learning and artificial intelligence. This allows the model to run concurrently with a demanding 3D game in real time. The figure below shows age is how DLSS model is trained.

In essence, DLSS balances the demand for high-quality graphics and smooth performance, making it possible to enjoy visually stunning games even on hardware that might not support native high-resolution rendering. This technology exemplifies the potential of AI in optimizing and transforming gaming experiences.

History of DLSS

Now that we know an overview of DLSS, let’s look at the evolution of DLSS.

DLSS 1.0

DLSS’s first version is a two-stage, primarily spatial image upscaler that uses convolutional auto-encoder neural networks in both stages. An image enhancement network is used in the first step to perform edge enhancement and spatial anti-aliasing using the motion vectors and current frame. The single raw, low-resolution frame is used in the second stage of the process, known as image upscaling, to bring the image up to the required output resolution. When upscaling from a single frame, the neural network must create a lot of new information in order to produce the high resolution output. This may cause subtle hallucinations, like leaves that aren’t quite the same as the original content.

Traditional supersampling to 64 samples per pixel and the motion vectors for every frame are used to create a “perfect frame” for the neural networks to be trained during each game. It is imperative that the data collected be as detailed as possible, encompassing the maximum number of levels, times of day, graphical settings, resolutions, etc. To aid with the generalization of the test data, this data is additionally enhanced using standard augmentation techniques like rotations, color changes, and random noise. The Saturn V supercomputer from Nvidia is used for training.

Many criticized the first iteration’s often-soft appearance and artifacts in specific scenarios; this was probably due to the limited data obtained from the neural networks’ inability to be trained to perform optimally in all scenarios and edge cases using only one frame as input.

Additionally, Nvidia showed off how auto-encoder networks could pick up the ability to replicate motion blur and depth-of-field—features that have never been seen in a product that has been made available to the general public.

DLSS 2.0

DLSS 2.0 is an advanced temporal anti-aliasing upsampling (TAAU) method that reduces aliasing and enhances detail by utilizing information from previous frames, including motion vectors, exposure/brightness information, motion vectors, and raw low-resolution input. With DLSS 2.0, temporal artifacts are avoided by using a convolutional auto-encoder neural network instead of the manually written heuristics used in traditional TAAU methods. This results in improved detail resolution and decreased blurriness.

DLSS 2.0 can produce sharper images than some native resolution renderings using conventional TAA thanks to this neural network technique. It generally offers significant improvements over DLSS 1.0, including better detail retention, a generalized neural network that doesn’t require per-game retraining, and lower processing overhead, though it still displays artifacts like ghosting in some scenarios.

Unlike ESRGAN or DLSS 1.0, which are conventional upscalers, DLSS 2.0 recovers data from earlier frames. Consequently, unless developers apply a mip-map bias to use higher resolution textures, low-resolution textures will remain low-resolution.

DLSS 3.0

In “How does DLSS work?” section we talked about Super Resolution from DLSS 2.0. DLSS has grown drastically since DLSS 2.0. DLSS 3 adds Optical Multi Frame Generation, which is used to generate entirely new frames.

This is an additional convolutional autoencoder that accepts four inputs: game engine data, an optical flow field, and the current and previous frames. The optical flow field is computed by the Optical Flow Accelerator after it has analyzed two consecutive in-game frames. The direction and speed at which pixels move from one frame to the next are determined by the optical flow field. Motion vector computations frequently do not account for pixel-level information such as particles, shadows, and reflections, but this accelerator can. Below is an illustration of how the motion vectors lack this information:

Real-time motion estimation with ADA optical flow accelerator

In order to create a completely new frame between the two provided frames, the DLSS Frame Generation Neural Network combines the game motion vectors, the optical flow field, and sequential frames. DLSS can therefore increase frame rates up to 4 times over brute force rendering by upscaling the conventional renders in addition to adding a new frame in between each upscaled frame.

DLSS 3.5

In September 2023, DLSS 3.5 was released, with ray reconstruction. Let me explain to you what this technology is.

Ray tracing is a sophisticated rendering method that mimics how light behaves when it interacts with surfaces and objects in a three-dimensional virtual world. A scene from the game engine is used to create ray-traced effects. This involves sending rays into the scene to interact with the geometry and lighting. The issue is that there are too many pixels and uneven ray distribution, so you can never send enough rays to get a precise idea of how the scene will appear. A noisy image like this is the result:

Hand-tuned denoisers are then used to fill in the mixing pixels and make an estimate of how the scene looks. The problem with denoisers is that while they can remove dynamic lighting effects and introduce ghosting, they do so by acquiring pixels from previous frames in an attempt to increase detail. Additionally, because denoisers smear information throughout the frame, reflections may have less detail. When ray-traced lighting is used with DLSS, the loss of detail is accentuated because the lighting is sampled at a low resolution, passes through a denoiser, and is then upscaled.

NVIDIA has now addressed both issues by combining Ray Reconstruction and Super Resolution into a single model with DLSS 3.5. Compared to the DLSS 3 model, this new model has been trained on five times as much data. It can now recognize various ray-traced effects, incorporate additional engine data, and save high-frequency data for upscaling. Now more intelligent than hand-tuned denoisers, this new model can produce lighting effects by identifying patterns in frames of sampled rays. With an overall performance boost, DLSS can now produce frames that are as good as native resolution, with even better frequency.

Current DLSS 3.5 functionalities

Let us now explore the diverse functionalities of DLSS 3.5.

DLSS Frame Generation

Boosts performance by using AI to generate more frames while maintaining great responsiveness with NVIDIA Reflex. DLSS analyzes sequential frames and motion data from the new Optical Flow Accelerator in GeForce RTX 40 Series GPUs to create additional high quality frames.

DLSS Ray Reconstruction

Enhances image quality for all GeForce RTX GPUs by using AI to generate additional pixels for intensive ray-traced scenes. DLSS replaces hand-tuned denoisers with an NVIDIA supercomputer-trained AI network that generates higher-quality pixels in between sampled rays.

DLSS Super Resolution

Boosts performance for all GeForce RTX GPUs by using AI to output higher resolution frames from a lower resolution input. DLSS samples multiple lower resolution images and uses motion data and feedback from prior frames to reconstruct native quality images.

Deep Learning Anti-aliasing

Provides higher image quality for all GeForce RTX GPUs with an AI-based anti-aliasing technique. DLAA uses the same Super Resolution technology developed for DLSS, reconstructing a native resolution image to maximize image quality.

DLSS Availability

DLSS Competitors

We will now have a look at DLSS’ competitors and see how it fares in comparison.

AMD’s FSR

NVIDIA is currently spearheading the use of this cutting-edge technology. They won’t, however, remain in the lead at all times. Rival firms such as AMD and Intel are developing their own solutions in direct competition with DLSS. FidelityFX Super Resolution, or FSR, is AMD’s take on DLSS that upscales frames while requiring a lot less processing power than DLSS. This is due to the fact that upscales an image without the use of deep learning by modifying the Lanczos algorithm.

The fact that this works on a variety of GPUs without the tensor cores that are exclusive to new NVIDIA cards makes it fantastic. FSR has a lot of catching up to do, but it is not nearly as developed as DLSS, despite not being computationally demanding. While DLSS was initially released in February 2019, the first iteration was released in June 2021.

Intel’s XeSS

Deep-learning image upscaling technology from Intel, called XeSS, is another emerging technology that shares many similarities with DLSS. DLSS and FSR are nearly combined in XeSS. Similar to DLSS, it employs AI for upscaling, but unlike FSR, it is not restricted to any particular GPU architecture. Any GPU that can perform DP4a AI computations can run it. Again, in its infancy, XeSS is not as mature as DLSS and does not support many new games, but in time, Intel may unseat NVIDIA as the leader in upscaling.

Conclusion

With the use of Nvidia’s DLSS we can provide the best graphics and performance. We are able to strike a balance between the graphics(visual quality), performance(fps), and resolution. From the beginning till DLSS 3.5, we have seen what AI can do in optimizing and enhancing the games. With the use of Deep Learning models, we are able to get resolution and fps with a less powerful hardware. We have also seen that DLSS is dominating the market right now with some little competition from AMD’s FSR and Intel’s XeSS. DLSS is still being improved as we speak and it is going to be crucial for the gaming industry.

Frequently Asked Questions

Q1. What is DLSS and how does it benefit gaming?

A. DLSS is a technology that uses AI to enhance the gaming experience by striking the best optimum between resolution, fps, and visual quality. It renders frames at lower resolution and upscales them to higher resolution thereby reducing the load on GPUs. This makes lower end hardware perform better.

Q2. How does DLSS work?

A. Nvidia’s DLSS is based on a convolutional auto-encoder neural network. This is used to convert low resolution images into high resolution. We train those convolutional auto-encoders on a huge dataset of images with thousands of samples rendered at different resolutions. The network during training learns to produce frame with high resolution from low resolution frames.

Q3. What are the key differences between DLSS 1.0, 2.0, and 3.5?

A. The two-stage spatial upscaler known as DLSS 1.0 frequently produced softer images and artifacts. DLSS 2.0 enhanced this by employing temporal anti-aliasing upsampling. This lowers blurriness and enhances detail by utilizing data from earlier frames. By creating fresh frames and refining ray-traced effects, Optical Multi Frame Generation and Ray Reconstruction were introduced in DLSS 3.5, greatly improving performance and visual quality.

Q4. How does DLSS compare to competitors like AMD’s FSR and Intel’s XeSS?

A. When it comes to integrating AI and improving overall performance, Nvidia’s DLSS is more sophisticated. AMD’s FSR is less computationally intensive than DLSS but does not upscale images with deep learning. Comparing Intel’s XeSS to DLSS, it is still in its infancy but offers broad GPU compatibility along with AI upscaling. AI-driven image enhancing technologies remain at the forefront thanks to Nvidia’s early dominance and quick improvements in DLSS.

badrinarayan6645541 25 Jun, 2024

Data science intern at Analytics Vidhya, specializing in ML, DL, and AI. Dedicated to sharing insights through articles on these subjects. Eager to learn and contribute to the field's advancements. Passionate about leveraging data to solve complex problems and drive innovation.

Advanced Artificial Intelligence Autoencoder Deep Learning