Optimizing LLMs through Quantization: A Hands-On Tutorial

About the Event

In this talk, we will delve into the world of LLM Quantization, exploring how it can significantly enhance the efficiency and deployment of advanced deep learning models. Quantization reduces the computational and memory demands of LLMs by representing weights and activations with lower precision. This session will provide a practical, hands-on approach using Jupyter Notebooks to guide you from basic concepts to advanced quantization techniques. You will gain insights into the Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), understanding their applications, benefits, and challenges. By the end of the webinar, you'll be equipped with knowledge and tools to implement quantization strategies in your own projects, optimizing performance without sacrificing model accuracy.

Any Prerequisites for the Session -

-Basic understanding of ML concepts

-Familiarity with neural networks and deep learning fundamentals.

-Proficiency in Python programming.

-Good to have familiarity with PyTorch and HuggingFace Transformers (not mandatory)

-Basic experience in using Jupyter notebooks for code execution and visualization.

About the Speaker

Sri Raghu Malireddi

Machine Learning Engineer at Grammarly

Sri Raghu Malireddi is a Senior Machine Learning Engineer at Grammarly, working on the On-Device Machine Learning. He specializes in deploying and optimizing Large Language Models (LLMs) on-device, focusing on improving system performance and algorithm efficiency. He has played a key role in the on-device personalization of the Grammarly Keyboard. Before joining Grammarly, he was a Senior Software Engineer and Tech Lead at Microsoft, working on several key initiatives for deploying machine learning models in Microsoft Office products. You can reach him on LinkedIn.

Participate in discussion

Registration Details

2278

Registered

Optimizing LLMs through Quantization: A Hands-On Tutorial