Demystify Parallel Programming: Hands-on with CUDA for GenAI

About

This session offers an in-depth exploration of leveraging CUDA to optimize NVIDIA hardware, essential for the explosive growth of Generative AI applications. Generative AI models, which include image and text generation and code completion, rely heavily on accelerated hardware for efficient training and inference. While high-level frameworks like PyTorch and TensorFlow simplify the process, true optimization and control are unlocked through CUDA, NVIDIA’s low-level compiler interfaces directly with GPUs.

The session begins with thoroughly reviewing C programming fundamentals, ensuring a solid base. It then demystifies core CUDA concepts, including threads, blocks, grids, and memory hierarchies, teaching participants to think in parallel for efficient GPU utilization. The course is designed to be highly interactive, with practical sessions guiding learners through writing their kernels, the essential workhorses of CUDA programs. This hands-on approach provides participants with real-world experience in parallel programming, ensuring they understand and effectively utilize the power of parallel processing in Generative AI.

Key Takeaways:

  • Confidently navigate the world of CUDA programming. 
  • Write your own CUDA kernels to leverage the power of GPUs. 
  • Approach problems from a parallel programming perspective for optimal performance. 
  • This session is your gateway to unlocking the true potential of Generative AI. Join us and take your skills to the next level!

Speaker

Book Tickets
Stay informed about DHS 2025

Download agenda

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details